Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to the
processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector
and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes
care of injecting this vector when the guest is able to receive it.
Legacy PIC interrupts are still delivered via the event injection mechanism.
This is because the vector injected by the PIC must reflect the state of its
pins at the time the CPU is ready to accept the interrupt.
Accesses to the TPR via %CR8 are handled entirely in hardware. This requires
that the emulated TPR must be synced to V_TPR after a #VMEXIT.
The guest can also modify the TPR via the memory mapped APIC. This requires
that the V_TPR must be synced with the emulated TPR before a VMRUN.
Set the 'vmexit->inst_length' field properly depending on the type of the
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.
Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.
Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.
Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).
- Don't enable the HLT intercept by default. It will be enabled by bhyve(8)
if required. Prior to this change HLT exiting was always enabled making
the "-H" option to bhyve(8) meaningless.
- Recognize a VM exit triggered by a non-maskable interrupt. Prior to this
change the exit would be punted to userspace and the virtual machine would
terminate.
AMD processors that have the SVM decode assist capability will store the
instruction bytes in the VMCB on a nested page fault. This is useful because
it saves having to walk the guest page tables to fetch the instruction.
vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len'
that map directly to 'vie->inst[]' and 'vie->num_valid'.
The instruction emulation handler skips calling 'vmm_fetch_instruction()'
if 'vie->num_valid' is non-zero.
The use of this capability can be turned off by setting the sysctl/tunable
'hw.vmm.svm.disable_npf_assist' to '1'.
Repurpose the V_IRQ interrupt injection to implement VMX-style interrupt
window exiting. This simply involves setting V_IRQ and enabling the VINTR
intercept. This instructs the CPU to trap back into the hypervisor as soon
as an interrupt can be injected into the guest. The pending interrupt is
then injected via the traditional event injection mechanism.
Rework vcpu interrupt injection so that Linux guests now idle with host cpu
utilization close to 0%.
Allow intercepts and irq fields to be cached by the VMCB.
Provide APIs svm_enable_intercept()/svm_disable_intercept() to add/delete
VMCB intercepts. These APIs ensure that the VMCB state cache is invalidated
when intercepts are modified.
Each intercept is identified as a (index,bitmask) tuple. For e.g., the
VINTR intercept is identified as (VMCB_CTRL1_INTCPT,VMCB_INTCPT_VINTR).
The first 20 bytes in control area that are used to enable intercepts
are represented as 'uint32_t intercept[5]' in 'struct vmcb_ctrl'.
Modify svm_setcap() and svm_getcap() to use the new APIs.
Prior to this change an ASID was hard allocated to a guest and shared by all
its vcpus. The meant that the number of VMs that could be created was limited
to the number of ASIDs supported by the CPU. It was also inefficient because
it forced a TLB flush on every VMRUN.
With this change the number of guests that can be created is independent of
the number of available ASIDs. Also, the TLB is flushed only when a new ASID
is allocated.
neel [Mon, 25 Aug 2014 00:58:20 +0000 (00:58 +0000)]
An exception is allowed to be injected even if the vcpu is in an interrupt
shadow, so move the check for pending exception before bailing out due to
an interrupt shadow.
Change return type of 'vmcb_eventinject()' to a void and convert all error
returns into KASSERTs.
Fix VMCB_EXITINTINFO_EC(x) and VMCB_EXITINTINFO_TYPE(x) to do the shift
before masking the result.
imp [Thu, 14 Aug 2014 04:21:31 +0000 (04:21 +0000)]
Add AIC to at91sam9260 support, now that it is needed for multipass to
work. This gets my AT91SAM9260-based boards almost booting with
current in multi pass. The MCI driver is broken, but it is equally
broken before multi-pass.
imp [Thu, 14 Aug 2014 04:20:13 +0000 (04:20 +0000)]
From https://sourceware.org/ml/newlib/2014/msg00113.html
By Richard Earnshaw at ARM
>
>GCC has for a number of years provides a set of pre-defined macros for
>use with determining the ISA and features of the target during
>pre-processing. However, the design was always somewhat cumbersome in
>that each new architecture revision created a new define and then
>removed the previous one. This meant that it was necessary to keep
>updating the support code simply to recognise a new architecture being
>added.
>
>The ACLE specification (ARM C Language Extentions)
>(http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.swdev/index.html)
>provides a much more suitable interface and GCC has supported this
>since gcc-4.8.
>
>This patch makes use of the ACLE pre-defines to map to the internal
>feature definitions. To support older versions of GCC a compatibility
>header is provided that maps the traditional pre-defines onto the new
>ACLE ones.
Stop using __FreeBSD_ARCH_armv6__ and switch to __ARM_ARCH >= 6 in the
couple of places in tree. clang already implements ACLE. Add a define
that says we implement version 1.1, even though the implementation
isn't quite complete.
pfg [Wed, 13 Aug 2014 21:18:31 +0000 (21:18 +0000)]
Use "NO NAME" as the default unnamed label.
Microsoft recommends avoiding the use of spaces in the
string structures for FAT. Unfortunately they do just
that by default in the case of unlabeled filesystems.
Follow the default MS behavior to avoid confusion in
common tools like file(1). This was actually the
default behavior before r203868.
Obtained from: NetBSD (CVS rev. 1.39)
MFC after: 3 days
dim [Wed, 13 Aug 2014 16:42:44 +0000 (16:42 +0000)]
Supplement r259111 by also using correct casts in gcc's emmintrin.h for
the first argument of the following builtin function:
* __builtin_ia32_psrlqi128() takes __v2di instead of __v4si
This should fix the following errors when building the graphics/webp
port with base gcc:
lossless_sse2.c:403: error: incompatible type for argument 1 of '__builtin_ia32_psrlqi128'
lossless_sse2.c:404: error: incompatible type for argument 1 of '__builtin_ia32_psrlqi128'
Reported by: Jos Chrispijn <ports@webrz.net>
MFC after: 3 days
tuexen [Wed, 13 Aug 2014 15:50:16 +0000 (15:50 +0000)]
Add support for the SCTP_PR_STREAM_STATUS and SCTP_PR_ASSOC_STATUS
socket options. This includes managing the correspoing stat counters.
Add the SCTP_DETAILED_STR_STATS kernel option to control per policy
counters on every stream. The default is off and only an aggregated
counter is available. This is sufficient for the RTCWeb usecase.
kib [Wed, 13 Aug 2014 05:53:41 +0000 (05:53 +0000)]
Add a knob LIBPTHREAD_BIGSTACK_MAIN, which instructs libthr to leave
the whole RLIMIT_STACK-sized region of the kernel-allocated stack as
the stack of main thread.
By default, the main thread stack is clamped at 2MB (4MB on 64bit
ABIs) and the rest is used for other threads stack allocation. Since
there is no programmatic way to adjust the size of the main thread
stack, pthread_attr_setstacksize() is too late, the knob allows user
to manage the main stack size both for single-threaded and
multi-threaded processes with the rlimit.
Reported by: "Ivan A. Kosarev" <ivan@ivan-labs.com>
Tested by: dim
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
kib [Wed, 13 Aug 2014 05:44:08 +0000 (05:44 +0000)]
If vm_page_grab() allocates a new page, the page is not inserted into
page queue even when the allocation is not wired. It is
responsibility of the vm_page_grab() caller to ensure that the page
does not end on the vm_object queue but not on the pagedaemon queue,
which would effectively create unpageable unwired page.
In exec_map_first_page() and vm_imgact_hold_page(), activate the page
immediately after unbusying it, to avoid leak.
In the uiomove_object_page(), deactivate page before the object is
unlocked. There is no leak, since the page is deactivated after
uiomove_fromphys() finished. But allowing non-queued non-wired page
in the unlocked object queue makes it impossible to assert that leak
does not happen in other places.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
ngie [Wed, 13 Aug 2014 04:56:27 +0000 (04:56 +0000)]
Integrate lib/libutil into the build/kyua
Remove the .t wrappers
Rename all of the TAP test applications from test-<test> to
<test>_test to match the convention described in the TestSuite
wiki page
humanize_number_test.c:
- Fix -Wformat warnings with counter variables
- Fix minor style(9) issues:
-- Header sorting
-- Variable declaration alignment/sorting in main(..)
-- Fit the lines in <80 columns
- Fix an off by one index error in the testcase output [*]
- Remove unnecessary `extern char * optarg;` (this is already provided by
unistd.h)
rpaulo [Wed, 13 Aug 2014 01:27:51 +0000 (01:27 +0000)]
Make sure the DTrace header files are built before depend and before
the build starts.
This adds a new variable DHDRS that contains a list of all DTrace
header files. Then, we use the beforedepend hook to make sure the
heaeder files are built.
Introduce a beforebuild dependency (from projects/bmake) based on
feedback from Simon J. Gerraty. This lets us generate the header
files without running make depend.
ngie [Tue, 12 Aug 2014 17:51:26 +0000 (17:51 +0000)]
Complete the usr.bin/yacc kyua integration work I originally
submitted via r268811
- Install the Kyuafile by adding FILES to FILESGROUPS
- Run the testcases with an unprivileged user
Some of the testcases depend upon behavior that's broken when
run as root on FreeBSD because of how permissions are treated
with access(2) vs eaccess(2), open(2), etc
- Simplify the test driver to just inspect the exit code from
run_test because it now exits with 0 if successful and exits
with !0 if unsuccessful
- Don't do ad hoc temporary directory creation/deletion; let Kyua
handle that
- Add entries for files removed in r268811 to
OptionalObsoleteFiles.inc
Many compilers may optimize away the overflow check `msg + l < msg',
where `msg' is a pointer and `l' is an integer, because pointer
overflow is undefined behavior in C.
Use a safe precondition test `l >= eom - msg' instead.
hselasky [Tue, 12 Aug 2014 11:45:57 +0000 (11:45 +0000)]
- Fix radix tree memory leakage when unloading modules using radix
trees. This happens because the logic inserting items into the radix
tree is allocating empty radix levels, when index zero does not
contain any items.
- Add proper error case handling, so that the radix tree does not end
up in a bad state, if memory cannot be allocated during insertion of
an item.
- Add check for inserting NULL items into the radix tree.
- Add check for radix tree getting too big.
kib [Tue, 12 Aug 2014 09:33:00 +0000 (09:33 +0000)]
Revision r269457 removed the Giant around mount and unmount code, but
r269533, which was tested before r269457 was committed, implicitely
relied on the Giant to protect the manipulations of the softdepmounts
list. Use softdep global lock consistently to guarantee the list
structure now.
Insert the new struct mount_softdeps into the softdepmounts only after
it is sufficiently initialized, to prevent softdep_speedup() from
accessing bare memory. Similarly, remove struct mount_softdeps for
the unmounted filesystem from the tailq before destroying structure
rwlock.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
ae [Tue, 12 Aug 2014 09:10:13 +0000 (09:10 +0000)]
Add sysctl and loader tunable kern.geom.part.mbr.enforce_chs that is set
by default. It can be used to disable automatic alignment to CHS geometry,
that GEOM_PART_MBR does.
alc [Mon, 11 Aug 2014 17:45:41 +0000 (17:45 +0000)]
Change {_,}pmap_allocpte() so that they look for the flag PMAP_ENTER_NOSLEEP
instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table
page allocation. (The same functions in the i386/xen and mips pmap
implementations already use PMAP_ENTER_NOSLEEP.)
gjb [Mon, 11 Aug 2014 16:31:28 +0000 (16:31 +0000)]
In arm/release.sh, continue if 'xdev-links' target fails
where the target is not valid (stable/10), instead of doing
per-branch evaluation on if xdev-links needs to be invoked.
royger [Mon, 11 Aug 2014 15:37:02 +0000 (15:37 +0000)]
blkfront: add support for unmapped IO
Using unmapped IO is really beneficial when running inside of a VM,
since it avoids IPIs to other vCPUs in order to invalidate the
mappings.
This patch adds unmapped IO support to blkfront. The following tests
results have been obtained when running on a Xen host without HAP:
PVHVM
3165.84 real 6354.17 user 4483.32 sys
PVHVM with unmapped IO
2099.46 real 4624.52 user 2967.38 sys
This is because when running using shadow page tables TLB flushes and
range invalidations are much more expensive, so using unmapped IO
provides a very important performance boost.
imp [Mon, 11 Aug 2014 14:50:49 +0000 (14:50 +0000)]
Remove dependence on source tree options. Move all kernel module
options into kern.opts.mk and change all the places where we use
src.opts.mk to pull in the options. Conditionally define SYSDIR and
use SYSDIR/conf/kern.opts.mk instead of a CURDIR path. Replace all
instances of CURDIR/../../etc with STSDIR, but only in the affected
files.
As a special compatibility hack, include bsd.owm.mk at the top of
kern.opts.mk to allow the bare build of sys/modules to work on older
systems. If the defaults ever change between 9.x, 10.x and current for
these options, however, you'll wind up with the host OS' defaults
rather than the -current defaults. This hack will be removed when
we no longer need to support this build scenario.
ache [Mon, 11 Aug 2014 12:26:48 +0000 (12:26 +0000)]
Fix too long (seed length >12 chars) challenge handling.
1) " ext" length should be included into OPIE_CHALLENGE_MAX (as all places
of opie code expects that).
2) Overflow check in challenge.c is off by 1 even with corrected
OPIE_CHALLENGE_MAX
3) When fallback to randomchallenge() happens and rval is 0 (i.e.
challenge is too long), its value should be set to error state too.
To demonstrate the bug, run opiepasswd with valid seed:
opiepasswd -s 1234567890123456
and notice that it falls back to randomchallenge() (i.e. no 1234567890123456 in the prompt).
dumbbell [Sun, 10 Aug 2014 17:04:10 +0000 (17:04 +0000)]
vt(4): Colors are indexed against a console palette, not a VGA palette
Rename vt_generate_vga_palette() to vt_generate_cons_palette() and
change it to build a palette where the color index is the same than in
terminal escape codes, not the VGA index. That's what TCHAR_CREATE()
uses and passes to vt(4).
The main differences between both orders are:
o Blue and red are swapped (1 <-> 4)
o Yellow and cyan are swapped (3 <-> 6)
The problem remained unnoticed, because the RGB bit indexes passed to
vt_generate_vga_palette() were reversed. This inversion was cancelled
by the colors inversions in the generated palette. For instance, red
(0xff0000) and blue (0x0000ff) have bytes in opposite order, but were
swapped in the palette. But after changing the value of blue (see last
paragraph), the modified color was in fact the red one.
This commit includes a fix to creator_vt.c, submitted by Nathan
Whitehorn: fb_cmsize is set to 16. Before this, the generated palette
would be overwritte. This fixes colors on sparc64 with a Creator3D
adapter.
While here, tune the palette to better match console colors and improve
the readability (especially the dark blue).
Submitted by: nwhitehorn (fix to creator_vt.c)
MFC after: 1 week
kib [Sun, 10 Aug 2014 16:59:39 +0000 (16:59 +0000)]
On sparc64, do not keep mappings for the destroyed sf_bufs. Sparc64
pmap, unlike i386, and similar to i386/xen pv, does not tolerate
abandoned mappings for the freed pages.
Reported and tested by: dumbbell
Diagnosed and reviewed by: alc
Sponsored by: The FreeBSD Foundation
dumbbell [Sun, 10 Aug 2014 15:02:51 +0000 (15:02 +0000)]
vt(4): Add vtbuf_dirty*_locked() to lock vtbuf once, not twice
In several functions, vtbuf_putchar() in particular, the lock on vtbuf
is acquired twice:
1. once by the said functions;
2. once in vtbuf_dirty().
Now, vtbuf_dirty_locked() and vtbuf_dirty_cell_locked() allow to
acquire that lock only once.
This improves the input speed of vt(4). To measure the gain, a
50,000-lines file was displayed on the console using cat(1). The time
taken by cat(1) is reported below:
o On amd64, with vt_vga:
- before: 1.0"
- after: 0.5"
o On sparc64, with creator_vt:
- before: 13.6"
- after: 10.5"
dumbbell [Sun, 10 Aug 2014 14:55:39 +0000 (14:55 +0000)]
fbd: Fix a bug where vt_fb_attach() success would be considered a failure
vt_fb_attach() currently always returns 0, but it could return a code
defined in errno.h. However, it doesn't return a CN_* code. So checking
its return value against CN_DEAD (which is 0) is incorrect, and in this
case, a success becomes a failure.
The consequence was unimportant, because the caller (drm_fb_helper.c)
would only log an error message in this case. The console would still
work.
adrian [Sun, 10 Aug 2014 08:35:42 +0000 (08:35 +0000)]
Undo r195846 for now - allow raw frame transmit in monitor mode.
The original commit was supposed to stop the ability to do raw frame
injection in monitor mode to arbitrary channels (whether supported
by regulatory or not) however it doesn't seem to have been followed
by any useful way of doing it.
Apparently AHDEMO is supposed to be that way, but it seems to require
too much fiddly things (disable scanning, set a garbage SSID, etc)
for it to actually be useful for spoofing things.
So for now let's just disable it and instead look to filter transmit
in the output path if the channel isn't allowed by regulatory.
That way monitor RX works fine but TX will be blocked.
I don't plan on MFC'ing this to -10 until the regulatory enforcement
bits are written.
rpaulo [Sun, 10 Aug 2014 06:43:40 +0000 (06:43 +0000)]
Fix a few problems with the USDT probes:
* Include OBJDIR to make sure the generated file is found;
* Simplify the definition of OBJS;
* Add targets for shared objects and for profiled objects.