Neel Natu [Mon, 2 Mar 2015 20:13:49 +0000 (20:13 +0000)]
Fix warnings/errors when building vmm.ko with gcc:
- fix warning about comparison of 'uint8_t v_tpr >= 0' always being true.
- fix error triggered by an empty clobber list in the inline assembly for
"clgi" and "stgi"
- fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The
gcc assembler does not like the explicit operand "%rax" while the clang
assembler requires specifying the operand "%rax". Fix this by encoding the
instructions using the ".byte" directive.
Hiroki Sato [Mon, 2 Mar 2015 20:00:03 +0000 (20:00 +0000)]
Fix group membership of cloned interfaces when one is moved by
if_vmove().
In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface. When
detaching, if_delgroup() was called and the interface leaves all of
the group membership. And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.
This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation. However, if_vmove() removed the membership and did
not restore upon attachment.
Change the sa(4) driver to check for long position support on
SCSI-2 devices.
Some older tape devices claim to be SCSI-2, but actually do support
long position information. (Long position information includes
the current file mark.) For example, the COMPAQ SuperDLT1.
So we now only disable the check on SCSI-1 and older devices.
sys/cam/scsi/scsi_sa.c:
In saregister(), only disable fetching long position
information on SCSI-1 and older drives. Update the
comment to explain why.
Hiroki Sato [Mon, 2 Mar 2015 17:30:26 +0000 (17:30 +0000)]
Implement Enhanced DAD algorithm for IPv6 described in
draft-ietf-6man-enhanced-dad-13.
This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet. This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).
The length of the nonce is set to 6 bytes. This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.
Reported by: hiren
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D1835
Adrian Chadd [Mon, 2 Mar 2015 02:24:46 +0000 (02:24 +0000)]
Bring over the initial QCA955x SoC support framework.
This is enough to bring up the basic SoC support.
What works thus far:
* The mips74k core, pll setup, and UART (or else well, stuff would
be really difficult..)
* both USB 2.0 EHCI controllers
* on-board 2GHz 3x3 wifi (the other variant has 2GHz/5GHz wifi on-chip);
* arge0 - not yet sure why arge1 isn't firing off interrupts and thus
handling traffic, but I will soon figure it out and fix it here.
Tested:
* AP135 reference design, QCA9558 SoC, pretending to be an 11n
2GHz AP.
TODO:
* There's an interrupt mux hooking up devices to IP2 and IP3 - but it's
not a read-and-clear or write-to-clear register. So, trying to use it
naively like I have been ends up with massive interrupt storms.
For now the things that share those interrupts can just take them as
shared interrupts and try to play nice.
* There's two PCIe root complexes /and/ one of them can actually be
a PCIe device endpoint. Yes, you heard right. I have to teach the
AR724x PCIe bridge code to handle multiple instances with multiple
memory/irq regions, and then there'll be RC support, but EP support
isn't on my TODO list.
* I'm not sure why arge1 isn't up and running. I'll go figure that
out soon and fix it here.
Thankyou to Qualcomm Atheros for providing me with hardware and
an abundance of documentation about these things.
Adrian Chadd [Mon, 2 Mar 2015 02:14:44 +0000 (02:14 +0000)]
Lay some groundwork for having this stuff hang off of AHB rather than
the CPU nexus.
* Add ahb as a possible bus attachment
* Lay a comment down to remind me or whoever else ends up trying
to debug why the EEPROM isn't mapped in as to what's going on.
Warner Losh [Sun, 1 Mar 2015 21:41:37 +0000 (21:41 +0000)]
nandfs_meta_bread() calls bread() which can set bp to NULL in some
error cases. Calling brelse() with a NULL pointer is not allowed,
so only call brelse() when the bp is non-NULL.
Reported by: Maxime Villard (reported as uninitialized variable)
Fix an extremely subtle concurrency bug triggered by running on 32-thread
POWER8 systems. During thread switch, there was a very small window when
the stack pointer was set to the stack pointer of the outgoing thread, but
after the lock on that thread had already been released.
If, during that window, the outgoing thread were rescheduled on another CPU
and begin execution and an exception were taken on the original CPU, the
trap handler and the outgoing thread would simultaneously execute on the same
stack, causing memory corruption. Fix this by making sure to release the
old thread only after cpu_switch() is done with its stack.
Steve Kargl [Sun, 1 Mar 2015 20:32:47 +0000 (20:32 +0000)]
Give compilers a stronger hint to inline the functions
pzero[f], qzero[f], pone[f], and qone[f]. While here
fix the function declarations in accordance with style(9).
Steve Kargl [Sun, 1 Mar 2015 20:26:03 +0000 (20:26 +0000)]
When j0() and j1() were converted to j0f() and j1f(), the threshold
values for the different invervals were not converted correctly.
Adjust the threshold values to values, which should agree with the
comments.
Now, depending upon how things are wired up, the second CPU port (MAC1)
can be wired to either the switch (port6), or through port5's PHY, bypassing
the GMAC+switch entirely. Ie, it can pretend to be a boring PHY, saving
system designers from having to include a separate PHY for a "WAN" port.
Here's the rub - the AP135 board (QCA955x SoC) hooks up arge0 to
the second CPU port on the AR8327, but it's hooked up as RGMII.
So, in order to hook it up to the rest of the switch, it isn't configured
as a separate PHY - OpenWRT has it setup as connected via RGMII to
GMAC6 and (I'm guessing) it's set to be a WAN port by configuring up
port-based VLANs or something.
Thus, with a port mask of 0x3f, GMAC6 was never allowed to receive traffic
from any other port. It could transmit fine, but not receive anything.
So, now it works enough for me to continue doing board bootstrapping.
Note, this isn't enough to make the QCA955x + AR8327 work - there's
a bunch of uncommitted work to both the platform SoC (interrupt handling,
ethernet, etc) and the ethernet switch (register access space, setup, etc)
that needs to happen. However, this particular change is also relevant to
other SoCs, like the AR934x and AR7161, both of which can be glued to
this switch.
Tested:
* AP135 development board
TODO:
* Figure out whether I can somehow abuse another port mode to have this
be a pass-through PHY, or whether I should just create some more boot
time hints to explicitly set up port-based isolation so this works
in a more useful way by default.
vt(4): Add support to "downgrade" from eg. vt_fb to vt_vga
The main purpose of this feature is to be able to unload a KMS driver.
When going back from the current vt(4) backend to the previous backend,
the previous backend is reinitialized with the special VDF_DOWNGRADE
flag set. Then the current driver is terminated with the new "vd_fini"
callback.
In the case of vt_fb and vt_vga, this allows the former to pass the
vgapci device vt_fb used to vt_vga so the device can be rePOSTed.
Andrew Turner [Sun, 1 Mar 2015 10:04:14 +0000 (10:04 +0000)]
Fix the dtrace ARM atomic compare-and-set functions. These functions are
expected to return the data in the memory location pointed at by target
after the operation. The FreeBSD atomic functions previously used return
either 0 or 1 to indicate if the comparison succeeded or not respectively.
With this change these functions only support ARMv6 and later are supported
by these functions.
Adrian Chadd [Sun, 1 Mar 2015 07:00:34 +0000 (07:00 +0000)]
Add very initial QCA955x awareness to the GPIO code.
There's a lot more to come - the QCA955x has a bunch more GPIO MUX
configuration, reminiscent of what the ARM chips let you do - but
it'll have to come later.
Ryan Stone [Sun, 1 Mar 2015 00:52:34 +0000 (00:52 +0000)]
Add functions for parsing the iovctl config file
Add two functions for parsing the iovctl config file. The config
file is parsed using libucl[1], which accepts most YAML files and
a superset of JSON. The first function is an ad-hoc parser that
searches the file for the PF.DEVICE configuration value. We need
to know that value in order to fetch the schema from the kernel,
and we need the schema in order to be able to fully parse the file.
The second function parses the config file and validates it
against a schema. This function will exit with an error message
if any validation error occurs. If it succeeds, the configuration
is returned as an nvlist suitable for passing to the kernel.
Ryan Stone [Sun, 1 Mar 2015 00:52:28 +0000 (00:52 +0000)]
Add iovctl functions for validating config
Add an function to iovctl that validates the configuration against
a schema. This function is able to assume that the parser has
done most of the validation already and it's only responsible for
applying default VF values specified in the config file, confirming
that all required parameters have been set and that no invalid VF
numbers have been specified.
Ryan Stone [Sun, 1 Mar 2015 00:40:57 +0000 (00:40 +0000)]
Pass SR-IOV configuration to kernel using an nvlist
Pass all SR-IOV configuration to the kernel using an nvlist. The
main benefit that this offers is flexibility. It allows a driver
to accept any number of parameters of any type supported by the
SR-IOV configuration infrastructure with having to make any
changes outside of the driver.
It also offers the user very fine-grained control over the
configuration of the VFs -- if they want, they can have different
configuration applied to every VF.
Ryan Stone [Sun, 1 Mar 2015 00:40:51 +0000 (00:40 +0000)]
Add function to validate the consistency of SR-IOV config
Add a function that validates that the user-provided SR-IOV
configuration is valid. This includes basic checks that the
structure of the configuration is correct (e.g. all required
configuration nodes are present) as well as validating against
a configuration schema.
The schema validation consists of:
- Ensuring that all required config parameters are present.
- If the schema defines a default value for a parameter,
adding the default value if the parameter is not set.
- Ensuring that no parameters are specified in the config
that are not defined in the schema.
- Ensuring that have the correct type defined in the schema.
- Ensuring that no configuration nodes are present for devices
that do not exist. For example, if 2 VFs are configured,
then we validate that a node called VF-5 does not exist.
Ryan Stone [Sun, 1 Mar 2015 00:40:26 +0000 (00:40 +0000)]
Allocate PCI I/O memory spaces for VFs
When creating VFs, we must size each SR-IOV BAR on the PF and
allocate a configuous I/O memory window large enough for every VF.
However, the window only needs to be aligned to a boundary equal
to the size of the window for a single VF.
When a VF attempts to allocate an I/O memory resource, we must
intercept the request in the pci driver and pass it off to the
SR-IOV code, which will allocate the correct window from the
pre-allocated memory space for the PF.
Inform the pci driver about the size and address of the BARs on
the VF when the VF is created. This is required by pciconf -b and
bhyve.
Ryan Stone [Sun, 1 Mar 2015 00:40:19 +0000 (00:40 +0000)]
Emulate the Device ID and Vendor ID registers for VFs
The SR-IOV standard requires VFs to read all-ones when the VID
and DID registers are read. The VMM (hypervisor) is required to
emulate them instead. Make pci_read_config() do this emulation.
Change pci_user.c to use pci_read_config() to read config space
registers instead of going directly to the pcib so that the
emulated VID/DID registers work correctly on VFs. This is
required both for pciconf and bhyve PCI passthrough.
Ryan Stone [Sun, 1 Mar 2015 00:40:09 +0000 (00:40 +0000)]
Implement interface to create SR-IOV Virtual Functions
Implement the interace to create SR-IOV Virtual Functions (VFs).
When a driver registers that they support SR-IOV by calling
pci_setup_iov(), the SR-IOV code creates a new node in /dev/iov
for that device. An ioctl can be invoked on that device to
create VFs and have the driver initialize them.
At this point, allocating memory I/O windows (BARs) is not
supported.
Ryan Stone [Sun, 1 Mar 2015 00:39:48 +0000 (00:39 +0000)]
Allow passthrough devices to be hinted.
Allow the ppt driver to attach to devices that were hinted to be
passthrough devices by the PCI code creating them with a driver
name of "ppt".
Add a tunable that allows the IOMMU to be forced to be used. With
SR-IOV passthrough devices the VFs may be created after vmm.ko is
loaded. The current code will not initialize the IOMMU in that
case, meaning that the passthrough devices can't actually be used.
Ryan Stone [Sun, 1 Mar 2015 00:39:33 +0000 (00:39 +0000)]
Refactor PCI resource allocation
Refactor PCI resource allocation code to allow a request for a
memory-mapped I/O window that is a multiple of a requested size.
This is needed by the SR-IOV code because the VF BARs are all
allocated contiguously. We can't just allocate a resource that is
a multiple of a single VF BAR because the size of an allocation
implies its alignment requirement.
Ryan Stone [Sun, 1 Mar 2015 00:37:23 +0000 (00:37 +0000)]
Fix build of nv_tests.cc
nv_tests.cc managed to get two copies of several functions due to me
applying a patch in an unclean working tree. My kingdom for an
"svn clean" command.
Ryan Stone [Sun, 1 Mar 2015 00:22:53 +0000 (00:22 +0000)]
Add macros to make code compile in kernel
Make it possible to compile libnv in the kernel. Mostly this
involves wrapping functions that have a different signature in
the kernel and in userland (e.g. malloc()) in a macro that will
conditionally expand to the right API depending on whether the
code is being compiled for the kernel or not.
I have also #ifdef'ed out all of file descriptor-handling code,
as well as the unsafe varargs functions.
Ryan Stone [Sun, 1 Mar 2015 00:22:31 +0000 (00:22 +0000)]
Don't allocate memory for operations that do not insert
Almost every operation performed on an nvlist was allocating a
new string to hold the key name. The nvlist_exists* family of
functions would always return false if they failed to allocate
the string. The rest of the functions would outright abort().
Fix the non-varargs variants of the functions to perform the
requested operations directly and the varargs versions to
allocate the string and call into the non-varargs versions.
The varargs versions are still broken and really can't be fixed,
so we might consider axing them entirely. However, now the non-
varargs functions are always safe to call.
Ryan Stone [Sun, 1 Mar 2015 00:22:23 +0000 (00:22 +0000)]
Add function to force an nvlist into the error state
Add an nvlist_set_error() function that can be used to force an
nvlist into the error state. This is useful both for writing
tests and for writing APIs that use nvlists internally.