hyperv: Register Hyper-V timer early enough for TSC freq calibration
The i8254 simulation in Hyper-V is kinda broken and is not available
in Generation 2 Hyper-V VMs, so Hyper-V timer must be registered early
enough so that it can be used to do the TSC freq calibration.
This fixes the notorious warning like this:
calcru: runtime went backwards from 50 usec to 25 usec for pid 0 (kernel)
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: kib, sephe
Tested by: kib, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5778
case 'g' makes use of value n, which is initialized for case 'b'
and passe through to case 'g'. If case 'g' is called directly
value 'n' is not initialized.
Avoid the issue by initializing n before entering the switch.
John Baldwin [Thu, 31 Mar 2016 21:25:40 +0000 (21:25 +0000)]
Correct error messages in recently added test.
The large read test uses an empty file created via mkstemp() rather than
/dev/null as character devices are subject to two different clamping
sysctls. However, I forgot to update some of the error messages after
changing to mkstemp() that were still referring to /dev/null.
Ed Schouten [Thu, 31 Mar 2016 18:52:00 +0000 (18:52 +0000)]
Make Position Independent Executables work for CloudABI.
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
knows at which address it got loaded and can apply relocations.
Ed Schouten [Thu, 31 Mar 2016 18:50:06 +0000 (18:50 +0000)]
Sync in the latest CloudABI system call definitions.
Some time ago I made a change to merge together the memory scope
definitions used by mmap (MAP_{PRIVATE,SHARED}) and lock objects
(PTHREAD_PROCESS_{PRIVATE,SHARED}). Though that sounded pretty smart
back then, it's backfiring. In the case of mmap it's used with other
flags in a bitmask, but for locking it's an enumeration. As our plan is
to automatically generate bindings for other languages, that looks a bit
sloppy.
Change all of the locking functions to use separate flags instead.
John Baldwin [Thu, 31 Mar 2016 18:10:29 +0000 (18:10 +0000)]
Rework handling of thread sleeps before timers are working.
Previously, calls to *sleep() and cv_*wait*() immediately returned during
early boot. Instead, permit threads that request a sleep without a
timeout to sleep as wakeup() works during early boot. Sleeps with
timeouts are harder to emulate without working timers, so just punt and
panic explicitly if any thread tries to use those before timers are
working. Any threads that depend on timeouts should either wait until
SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers
are available.
Until APs are started earlier this should be a no-op as other kthreads
shouldn't get a chance to start running until after timers are working
regardless of when they were created.
John Baldwin [Thu, 31 Mar 2016 17:27:30 +0000 (17:27 +0000)]
Tidy up the unmapped I/O code in qphysio.
- Move some blocks around to reduce the number of 'if (unmap)' checks.
- Use 'pbuf == NULL' instead of 'unmap'.
- Use nitems.
- Pull an assignment out of an if expression.
Bryan Drewery [Thu, 31 Mar 2016 17:27:17 +0000 (17:27 +0000)]
WITHOUT_TOOLCHAIN: Skip building of h_raw.
-fsanitize does not seem to work when a --sysroot is specified and there
is no <sysroot>/usr/lib/clang/3.8.0/lib/freebsd/libclang_rt.ubsan_standalone-*.a.
Bryan Drewery [Thu, 31 Mar 2016 17:27:01 +0000 (17:27 +0000)]
WITHOUT_TOOLCHAIN: Fix build of rtld.
MK_TOOLCHAIN==no disables building and installing of pic archives.
c_pic.a is still needed for rtld though so force it to build in lib/libc
and link directly to the objdir version of it for rtld.
Zbigniew Bodek [Thu, 31 Mar 2016 16:44:32 +0000 (16:44 +0000)]
Fix number of the enabled VFs in VNIC
nic->num_vf_en is set based on the number of the enabled LMACs.
This number should not be overwritten later by any routine.
Instead it should fail PCI_IOV_ADD_VF() so that available VFs
with the corresponding LMACs will attach whereas other, disabled
VFs will fail with the proper error code.
Error signaling (due to improper number of VFs requested) is also moved
from PCI_IOV_INIT() to PCI_IOV_ADD_VF().
This will be reworked when multiple queue sets are enabled but for
now this is the correct behavior of the driver.
Zbigniew Bodek [Thu, 31 Mar 2016 13:23:43 +0000 (13:23 +0000)]
Don't omit m_dup() for non-writeable mbufs that need checksum calculation
If the driver is not active or link is down the packet could remain
non-writeable. This commit makes all mbufs enqueued to the driver's
ring buffer to have correct attributes.
Zbigniew Bodek [Thu, 31 Mar 2016 13:13:38 +0000 (13:13 +0000)]
Fix MAC address configuration for VNIC
The FDT description is as follows:
- phy-handle, reg, qlm-mode, mac-address are under nodes in bgx0/1 node
- phy nodes (pointed by phy-handle) are under MDIO even though they may
not be connected through to MDIO. In those nodes they do not contain
MAC address or etc.
This commit changes parsing of the FDT nodes for BGX so that it can
obtain correct MAC address for a given PHY.
Zbigniew Bodek [Thu, 31 Mar 2016 13:10:29 +0000 (13:10 +0000)]
Improve TX path of the VNIC driver
- Avoid memory leak when nicvf_tx_mbuf_locked() fails
- Introduce nicvf_xmit_locked() routine that uses drbr_peek(),
drbr_advance() or drbr_putback() for a specific ifnet.
This gives more clear and efficient design as well as
prevents from dropping mbufs that where not sent due to temporary
lack of descriptors.
- Add missing ETHER_BPF_MTAP() hook
Zbigniew Bodek [Thu, 31 Mar 2016 11:18:52 +0000 (11:18 +0000)]
Disable MSI-x for AHCI on Alpine plattform
Changes introduced to AHCI code adding support for MSI-x
caused interrupt storm on Alpine boards.
This is unintended behaviour so added quirk to omit this functionality.
Andrew Turner [Thu, 31 Mar 2016 11:07:24 +0000 (11:07 +0000)]
Add support for 4 level pagetables. The userland address space has been
increased to 256TiB. The kernel address space can also be increased to be
the same size, but this will be performed in a later change.
To help work with an extra level of page tables two new functions have
been added, one to file the lowest level table entry, and one to find the
block/page level. Both of these find the entry for a given pmap and virtual
address.
This has been tested with a combination of buildworld, stress2 tests, and
by using sort to consume a large amount of memory by sorting /dev/zero. No
new issues are known to be present from this change.
Reviewed by: kib
Obtained from: ABT Systems Ltd
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5720
Adrian Chadd [Thu, 31 Mar 2016 04:57:38 +0000 (04:57 +0000)]
Add support for the Nuvoton NCT5104D.
Make it compile only for i386/amd64 for now as it's been tested there.
It's quite possible it'll show up elsewhere and we can enable it
for other architectures later.
Bryan Drewery [Thu, 31 Mar 2016 03:04:26 +0000 (03:04 +0000)]
hosttools: Trim unneeded directories.
These should only be build tools that are in various Makefile.depend
as host dependencies. Anything toolchain related is handled by
toolchain and bootstrap-tools currently.
Bryan Drewery [Thu, 31 Mar 2016 00:26:40 +0000 (00:26 +0000)]
DIRDEPS_BUILD: Don't reset OBJROOT in sub-makes.
MAKEOBJDIRPREFIX is set to blank and exported from MAKELEVEL0 along
with OBJROOT exported. In sub-makes OBJROOT is recalculated with
an empty MAKEOBJDIRPREFIX though.
Bryan Drewery [Wed, 30 Mar 2016 23:50:29 +0000 (23:50 +0000)]
Fix the external GCC build after r297271 by setting -L <sysroot>/usr/lib.
GCC does add <sysroot>/usr/lib to the library search path but it comes after
/usr/local/lib which can find ports libraries such as libedit.so. The
bad path comes in as /usr/local/lib/gcc/x86_64-portbld-freebsd11.0/5.3.0/../../../
which corresponds to <prefix>/lib.
This partially reverts r297271.
Pointyhat to: bdrewery
Sponsored by: EMC / Isilon Storage Division
Ed Maste [Wed, 30 Mar 2016 14:42:09 +0000 (14:42 +0000)]
libc: stop exporting cerror
i386 stopped exporting .cerror in r240152, and likewise for amd64 in
r240178. It is not used by other libraries on any platform, so apply
the same change to the remaining architectures.
Reviewed by: jhibbits, jilles
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5774
Navdeep Parhar [Wed, 30 Mar 2016 01:08:08 +0000 (01:08 +0000)]
Remove unnecessary dequeue_mutex (added in r294610) from the iWARP
connection manager. Examining so_comp without synchronization with
iw_so_event_handler is a harmless race.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Reviewed by: Steve Wise @ Open Grid Computing
Sponsored by: Chelsio Communications
Adrian Chadd [Wed, 30 Mar 2016 00:42:18 +0000 (00:42 +0000)]
[net80211] Add fields to decode uAPSD fields.
It turns out that madwifi actually has the basics for uAPSD implemented
but it was never ported to FreeBSD. I may eventually port most of the
pieces; I'll see how it goes!
Do not access buffer if bread(9) or cluster_read(9) failed. On error,
the functions free the buffer and set the pointer to NULL. Also
remove useless call to brelse(9) on the error path.
Gleb Smirnoff [Tue, 29 Mar 2016 19:57:11 +0000 (19:57 +0000)]
The sendfile(2) allows to send extra data from userspace before the file
data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.
With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.
o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.
Fix several bugs in r297374:
- fix UP build [1]
- do not obliterate initial reading of rdtsc by the loop counter [2]
- restore the meaning of the argument -1 to native_lapic_ipi_wait()
as wait until LAPIC acknowledge without timeout
- correct formula for calculating loop iteration count for 1us, it was
inverted, and ensure that even on unlikely slow CPUs at least one
check for ack is performed.
Reported by: Michael Butler <imb@protected-networks.net> [1], rpokala[2],
jhb[3]
Tested by: Michael Butler
Pointy hat to: kib
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 29 Mar 2016 19:23:00 +0000 (19:23 +0000)]
Modify nd6_llinfo_timer() to acquire the nd6 lock before the LLE lock.
When expiring a neighbour cache entry we may need to look up the associated
default router, which requires the nd6 read lock. To avoid an LOR, the nd6
lock should be acquired first.
X-MFC-With: r296063
Tested by: Larry Rosenman <ler@lerctr.org> (previous revision)
Alexander Motin [Tue, 29 Mar 2016 19:18:34 +0000 (19:18 +0000)]
Modify "4958 zdb trips assert on pools with ashift >= 0xe".
Unlike Illumos FreeBSD has concept of logical ashift, that specifies
really minimal vdev block size that can be accessed. This knowledge
allows properly pad physical I/O and correctly assert its alignment.
This change fixes L2ARC write errors when device has logical sector
size above 512 bytes.
This driver works in PIO mode for now, interrupts are available only when
FIFO is enabled. The FIFO cannot be used with arbitrary sizes which defeat
its general use.
At some point we can add DMA transfers where the FIFO can be more useful.
Bryan Drewery [Tue, 29 Mar 2016 16:07:51 +0000 (16:07 +0000)]
Reword descriptions of asserting locks held without WITNESS.
This corrects an error in r296947 in that it is not possible to assert
which thread holds a shared (or read) lock, but it is possible to assert
that one is held. Just not very useful.
Import portions of the PowerPC OF PCI implementation into new file
"ofwpci.c", common for other platforms. The files ofw_pci.c and ofw_pci.h
from sys/powerpc/ofw no longer exist. All required declarations are moved
to sys/dev/ofw/ofwpci.h. This creates a new ofw_pci_write_ivar() function
and modifies some others methods. Most functions contain existing ppc
implementations in the majority unchanged. Now there is no need to have
multiple identical copies of methods for various architectures.
Andrew Turner [Tue, 29 Mar 2016 13:51:26 +0000 (13:51 +0000)]
Read the CPU ID for the current CPU from the GIC. The GIC may have a
different ID space than the kernel. Because of this we need to read the
ID from the hardware. The hardware will provide this value to the CPU by
reading any of the first 8 Interrupt Processor Targets Registers.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5706
Zbigniew Bodek [Tue, 29 Mar 2016 13:31:09 +0000 (13:31 +0000)]
Improve HW checksums support in VNIC
- Do not mark CSUM_IP_CHECKED and CSUM_IP_VALID on IPv6 packets.
IPv6 does not have checksums by definition.
- Set SCTP packets csum_flags CSUM_SCTP_VALID instead of
CSUM_DATA_VALID and skip csum_data
- Set csum_data simply as 0xffff without byteswap
Calibrate the frequency of the of the native_lapic_ipi_wait() loop,
and avoid a delay while waiting for IPI delivery acknowledgement in
xAPIC mode. This makes the loop exit immediately after the delivery
bit in APIC_ICR register is set, instead of waiting for some
microseconds.
We only need to ensure that some amount of time is allowed for the
LAPIC to react to the command, and we need that the wait time is
finite and reasonable. For that reasons, it is irrelevant if the CPU
frequency or throttling decrease the speed and make the loop,
calibrated for full CPU speed at boot time, execute somewhat slower.
Discussed with: bde, jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
John Baldwin [Mon, 28 Mar 2016 21:51:56 +0000 (21:51 +0000)]
Don't start the random harvester process until timers are working.
This is a no-op currently, but in kernels with earlier AP startup, the
random kthread was trying to use timeouts with sleeps before timers are
working. Wait until SI_SUB_KICK_SCHEDULER to start the random kproc.