Peter Grehan [Fri, 26 Oct 2012 13:40:12 +0000 (13:40 +0000)]
Remove mptable generation code from libvmmapi and move it to bhyve.
Firmware tables require too much knowledge of system configuration,
and it's difficult to pass that information in general terms to a library.
The upcoming ACPI work exposed this - it will also livein bhyve.
Also, remove code specific to NetApp from the mptable name, and remove
the -n option from bhyve.
Neel Natu [Wed, 24 Oct 2012 02:54:21 +0000 (02:54 +0000)]
Maintain state regarding NMI delivery to guest vcpu in VT-x independent manner.
Also add a stats counter to count the number of NMIs delivered per vcpu.
Peter Grehan [Fri, 19 Oct 2012 18:11:17 +0000 (18:11 +0000)]
Rework how guest MMIO regions are dealt with.
- New memory region interface. An RB tree holds the regions,
with a last-found per-vCPU cache to deal with the common case
of repeated guest accesses to MMIO registers in the same page.
- Support memory-mapped BARs in PCI emulation.
mem.c/h - memory region interface
instruction_emul.c/h - remove old region interface.
Use gpa from EPT exit to avoid a tablewalk to
determine operand address. Determine operand size
and use when calling through to region handler.
fbsdrun.c - call into region interface on paging
exit. Distinguish between instruction emul error
and region not found
pci_emul.c/h - implement new BAR callback api.
Split BAR alloc routine into routines that
require/don't require the BAR phys address.
ioapic.c
pci_passthru.c
pci_virtio_block.c
pci_virtio_net.c
pci_uart.c - update to new BAR callback i/f
Neel Natu [Thu, 4 Oct 2012 03:07:05 +0000 (03:07 +0000)]
The ioctl VM_GET_MEMORY_SEG is no longer able to return the host physical
address associated with the guest memory segment. This is because there is
no longer a 1:1 mapping between GPA and HPA.
As a result 'vmmctl' can only display the guest physical address and the
length of the lowmem and highmem segments.
Neel Natu [Thu, 4 Oct 2012 02:27:14 +0000 (02:27 +0000)]
Change vm_malloc() to map pages in the guest physical address space in 4KB
chunks. This breaks the assumption that the entire memory segment is
contiguously allocated in the host physical address space.
This also paves the way to satisfy the 4KB page allocations by requesting
free pages from the VM subsystem as opposed to hard-partitioning host memory
at boot time.
Peter Grehan [Wed, 3 Oct 2012 04:22:39 +0000 (04:22 +0000)]
Allow the number of FICL dictionary cells to be overridden.
Loading a 7.3 ISO with userboot/amd64 takes up 10035 cells,
overflowing the long-standing default of 10000.
Peter Grehan [Wed, 3 Oct 2012 02:58:55 +0000 (02:58 +0000)]
Restore the ability to boot partitioned disks. The previous submit
broke that by forcing raw disks, due to the use of error returns
by userboot's initial disk opens.
Peter Grehan [Tue, 2 Oct 2012 04:41:43 +0000 (04:41 +0000)]
Fix the error return in disk_readslicetab() when an MBR/GPT partition
wasn't found, and use that in userdisk_open() to allow raw disks
and ISO images to be read.
This is a temporary fix - disk.c has changed a lot in CURRENT so this
code may be reworked or made redundant on the next IFC. It is useful
to be able to boot from CD in the meantime.
Neel Natu [Sat, 29 Sep 2012 01:15:45 +0000 (01:15 +0000)]
Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.
In this case vm_malloc() was using vm_gpa2hpa() to indirectly infer whether
or not the address range had already been allocated.
Replace this instead with an explicit API 'vm_gpa_available()' that returns
TRUE if a page is available for allocation in guest physical address space.
Neel Natu [Thu, 27 Sep 2012 00:27:58 +0000 (00:27 +0000)]
Intel VT-x provides the length of the instruction at the time of the nested
page table fault. Use this when fetching the instruction bytes from the guest
memory.
Also modify the lapic_mmio() API so that a decoded instruction is fed into it
instead of having it fetch the instruction bytes from the guest. This is
useful for hardware assists like SVM that provide the faulting instruction
as part of the vmexit.
Neel Natu [Tue, 25 Sep 2012 02:33:25 +0000 (02:33 +0000)]
Add an explicit exit code 'SPINUP_AP' to tell the controlling process that an
AP needs to be activated by spinning up an execution context for it.
The local apic emulation is now completely done in the hypervisor and it will
detect writes to the ICR_LO register that try to bring up the AP. In response
to such writes it will return to userspace with an exit code of SPINUP_AP.
Neel Natu [Mon, 6 Aug 2012 07:20:25 +0000 (07:20 +0000)]
Fix a bug in how a 64-bit bar in a pci passthru device would be presented to
the guest. Prior to the fix it was possible for such a bar to appear as a
32-bit bar as long as it was allocated from the region below 4GB.
This had the potential to confuse some drivers that were particular about
the size of the bars.
Neel Natu [Sat, 4 Aug 2012 23:51:21 +0000 (23:51 +0000)]
The displacement field in the decoded instruction should be treated as a 8-bit
or 32-bit signed integer.
Simplify the handling of indirect addressing with displacement by
unconditionally adding the 'instruction->disp' to the target address.
This is alright since 'instruction->disp' is non-zero only for the
addressing modes that specify a displacement.
Neel Natu [Sat, 4 Aug 2012 02:14:27 +0000 (02:14 +0000)]
There is no need to explicitly specify the CR4_VMXE bit when writing to guest
CR4. This bit is specific to the Intel VTX and removing it makes the library
more portable to AMD/SVM.
In the Intel VTX implementation, the hypervisor will ensure that this bit is
always set. See vmx_fix_cr4() for details.
Neel Natu [Sat, 4 Aug 2012 02:06:55 +0000 (02:06 +0000)]
Force certain bits in %cr4 to be hard-wired to '1' or '0' from a guest's
perspective. If we don't do this some guest OSes (e.g. Linux) will reset
the CR4_VMXE bit in %cr4 with disastrous consequences.
Peter Grehan [Wed, 1 Aug 2012 05:53:20 +0000 (05:53 +0000)]
Bump up the heap size to 1MB. With a few kernel modules, libstand
zalloc and userboot seem to want to use ~600KB of heap space, which
results in a segfault when malloc fails in bhyveload.
Note that the I/O systems in FreeBSD and Solaris/Illumos are sufficiently
different that there is not a 1:1 mapping from scripts that work
with one to the other.
MFC after: 1 month
Peter Grehan [Wed, 11 Jul 2012 02:57:19 +0000 (02:57 +0000)]
Various VirtIO improvements
PCI:
- Properly handle interrupt fallback from MSIX to MSI to legacy.
The host may not have sufficient resources to support MSIX,
so we must be able to fallback to legacy interrupts.
- Add interface to get the (sub) vendor and device IDs.
- Rename flags to VTPCI_FLAG_* like other VirtIO drivers.
Block:
- No longer allocate vtblk_requests from separate UMA zone.
malloc(9) from M_DEVBUF is sufficient. Assert segment counts
at allocation.
- More verbose error and debug messages.
Network:
- Remove stray write once variable.
Virtqueue:
- Shuffle code around in preparation of converting the mb()s to
the appropriate atomic(9) operations.
- Only walk the descriptor chain when freeing if INVARIANTS is
defined since the result is only KASSERT()ed.
Warner Losh [Tue, 10 Jul 2012 19:48:42 +0000 (19:48 +0000)]
Go ahead and disable the interrupts for the DBGU the boot loader may
have left enabled after we detect the CPU, and remove the multiplely
copied code from the SoC modules.
Lawrence Stewart [Tue, 10 Jul 2012 08:31:28 +0000 (08:31 +0000)]
Move the ffclock symbols from FBSD_1.2 to FBSD_1.3 where they should have been
put initially. They were added to head during development of 10-CURRENT, not
9-CURRENT.
Adrian Chadd [Tue, 10 Jul 2012 07:45:47 +0000 (07:45 +0000)]
Add some debugging and comments about what's going on when reinitialising
the FIFO.
I still see some corner cases where no RX occurs when it should be
occuring. It's quite possible that there's a subtle race condition
somewhere; or maybe I'm not programming the RX queues right.
There's also no locking here yet, so any reset/configuration path
state change (ie, enabling/disabling receive from the ioctl, net80211
taskqueue, etc) could quite possibly confuse things.
Adrian Chadd [Tue, 10 Jul 2012 06:05:42 +0000 (06:05 +0000)]
Add/fix EDMA RX behaviour.
* For now, kickpcu should hopefully just do nothing - the PCU doesn't need
'kicking' for Osprey and later NICs. The PCU will just restart once
the next FIFO entry is pushed in.
* Teach "proc" about "dosched", so it can be used to just flush the
FIFO contents without adding new FIFO entries.
* .. and now, implement the RX "flush" routine.
* Re-initialise the FIFO contents if the FIFO is empty (the DP is NULL.)
When PCU RX is disabled (ie, writing RX_D to the RX configuration
register) then the FIFO will be completely emptied. If the software FIFO
is full, then no further descriptors are pushed into the FIFO and
things stall.
This all requires much, much more thorough stress testing.
David Xu [Tue, 10 Jul 2012 05:45:13 +0000 (05:45 +0000)]
Always clear p_xthread if current thread no longer needs it, in theory, if
debugger exited without calling ptrace(PT_DETACH), there is a time window
that the p_xthread may be pointing to non-existing thread, in practical,
this is not a problem because child process soon will be killed by parent
process.
- Remove the unused and not completed write support for NTFS.
- Fix a bug where vfs_mountedfrom() is called also when the filesystem
is not mounted successfully.
Adrian Chadd [Mon, 9 Jul 2012 23:58:22 +0000 (23:58 +0000)]
Add some AR9300 HAL descriptor definition changes.
* Add a couple of RX errors;
* Add the spectral scan PHY error code;
* extend the RX flags to be a 16 bit field, rather than an 8 bit field;
* Add a new RX flag.
Fix a bug in code that calculates the number of the first interrupt
vector for a port. This affected the gigabit ports of T422 cards (the
ones with 2x10G ports and 2x1G ports).
John Baldwin [Mon, 9 Jul 2012 20:55:39 +0000 (20:55 +0000)]
Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h>
on x86 and use that to implement stop_emulating() in the fpu/npx code.
Reimplement start_emulating() in the non-XEN case by using load_cr0() and
rcr0() instead of the 'lmsw' and 'smsw' instructions. Intel explicitly
discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in
the description of these instructions in Volume 2 of the ADM.