MFC 279929:
Allow the EFI loader to work with large kernels and/or modules
(for example, a large mfsroot). Note that for EFI the kernel and
modules (as well as other metadata files such as splash screens or
memory disk images) are loaded into a statically-sized staging area.
When the EFI loader exits it copies this staging area down to the
location the kernel expects to run at.
- Add bounds checking to the copy routines to fail attempts to access
memory outside of the staging area. Previously loading a combined
kernel + modules larger than the staging size (32MB) would overflow
the staging area trashing whatever memory was afterwards. Under
Intel's OVMF firmware for qemu this resulted in fatal faults in the
firmware itself. Now the attempt will fail with ENOMEM.
- Allow the staging area size to be configured at compile time via
an EFI_STAGING_SIZE variable in src.conf or on the command line.
It accepts the size of the staging area in MB. The default size
remains 32MB.
MFC r280702: Make swapper release orphaned (lost) GEOM provider.
Swap device is still reported as enabled, and system still may crash later
if some swapped-out kernel pages were lost with the device, but at least
GEOM and CAM can now release the lost disk, allowing it to be reconnected.
cxgbe(4): there is no need to force an "unimplemented" panic needlessly.
The calls to free_nm_txq and free_nm_rxq are made just a few lines prior
to the panic.
cxgbe(4): Add a minimal if_cxl module that pulls in the real driver as
a dependency. This ensures "ifconfig cxl<n> ..." does the right thing
even when it's run with no driver loaded.
MFC r279243-r279246, r279251, r279691, r279700, and r279701.
r279243:
cxgbe(4): request an automatic tx update when a netmap txq idles.
r279244:
cxgbe(4): wait for the hardware to catch up before destroying a netmap txq.
r279245:
cxgbe(4): do not set the netmap rxq interrupts on a hair-trigger.
r279246:
cxgbe(4): set up congestion management for netmap rx queues.
The hw.cxgbe.cong_drop knob controls the response of the chip when
netmap queues are congested.
r279251:
cxgbe(4): allow tx hardware checksumming on the netmap interface.
It is disabled by default but users can set IFCAP_TXCSUM on the
netmap ifnet (ifconfig ncxl0 txcsum) to override netmap and force
the hardware to calculate and insert proper IP and L4 checksums in
outbound frames.
r279691:
cxgbe(4): provide the correct size of freelists associated with netmap
rx queues to the chip. This will fix many problems with native netmap
rx on ncxl/ncxgbe interfaces.
r279700:
cxgbe(4): knobs to experiment with the interrupt coalescing timer for
netmap rx queues, and the "batchiness" of rx updates sent to the chip.
These knobs will probably become per-rxq in the near future and will be
documented only after their final form is decided.
r279701:
cxgbe(4): experimental rx packet sink for netmap queues. This is not
intended for general use.
MFC r281006
When an mbuf allocation fails in the receive path, the mbuf containing the received packet is not sent to the host network stack and is reused again on the receive ring. Remaining received packets in the ring are not processed in that invocation of bxe_rxeof() and defered to the task thread
r275539:
cxgbe(4): Allow for different pad and pack boundaries for different
adapters. Set the pack boundary for T5 cards to be the same as the
PCIe max payload size. The chip likes it this way.
In this revision the driver allocate rx buffers that align on both
boundaries. This is not a strict requirement and a followup commit
will switch the driver to a more relaxed allocation strategy.
r275554:
cxgbe(4): allow the driver to use rx buffers that do not end on a pack
boundary.
Fix some bad interaction between cxgbe(4) and lacp lagg(4) that could
leave a port permanently disabled when a copper cable is unplugged and
then plugged right back in.
lacp_linkstate goes looking for the current ifmedia on a link state
change and it could get stale information from cxgbe(4) on a module
unplug followed by replug. The fix is to process module events before
link-state events within the driver, and to always rebuild the ifmedia
list on a module change event (instead of rebuilding it lazily).
Thanks to asomers@ for the problem report and detailed analysis to go
with it.
When checking the length of the mutual secret password the variable for
the secret password was used by mistake. This resulted in ctld never
warning about the length of the mutual secret being wrong even if it was.
dim [Mon, 6 Apr 2015 14:50:54 +0000 (14:50 +0000)]
MFC r280864:
Pull in r233552 from upstream libc++ trunk (by Eric Fiselier):
[libcxx] Fix PR22771 - Support access control SFINAE in the library
version of is_convertible.
Summary:
Currently the conversion check does not take place in a context where
access control SFINAE is applied. This patch changes the context of
the test expression so that SFINAE occurs if access control does not
permit the conversion.
Related bug: https://llvm.org/bugs/show_bug.cgi?id=22771
dim [Sun, 5 Apr 2015 15:27:56 +0000 (15:27 +0000)]
Ensure yacc is built during bootstrap-tools for __FreeBSD_version 1001506 and earlier, since some of the ACPI tools now reach yacc's old
maximum table limit. This should fix the Jenkins buildbot, which
apparently runs 10.1-RELEASE.
MFC r258056 (by alc):
Eliminate the gratuitous use of mmap(2) flags from the implementation
of kern_shmat(). Use a simpler approach to determine whether to pass
VMFS_NO_SPACE or VMFS_OPTIMAL_SPACE to vm_map_find().
When catopen(3) returns an error, it caches the result of that error from
r202992. The refcount on the cache entry is not initialized, so any attempt
to clean the cache will skip over this item since it likely has a >0 value.
MFC 276724:
On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST). Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.
MFC 261790:
Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
PCI domain/segment. Host-PCI bridge drivers use this API to allocate
bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
full range of bus numbers from secbus to subbus from their parent bridge.
The drivers also always program their primary bus register. The bridge
drivers also support growing their bus range by extending the bus resource
and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.
MFC 260973:
- Reuse legacy_pcib_(read|write)_config() methods in the QPI pcib driver.
- Reuse legacy_pcib_alloc_msi{,x}() methods in the QPI and mptable pcib
drivers.
MFC 278761:
Include OBJT_PHYS VM objects in ELF core dumps. In particular this
includes the shared page allowing debuggers to use the signal trampoline
code to identify signal frames in core dumps.
Revert r280449;
Permit multiple arguments for the nonnull attribute.
For the benefit of anyone that may be struggling to port
FreeBSD to gcc 2.8 (or older) avoid using variadic macros.
MFC r280700 (partial);
Bring new attribute:
__result_use_check
Causes a warning to be emitted if a caller of the function
with this attribute does not use its return value. This is
known in gcc as "warn_unused_result" but we considered the
original naming unsuitable for an attribute.
jhb [Tue, 31 Mar 2015 15:37:24 +0000 (15:37 +0000)]
MFC 278760:
Add two new counters for vnode life cycle events:
- vfs.recycles counts the number of vnodes forcefully recycled to avoid
exceeding kern.maxvnodes.
- vfs.vnodes_created counts the number of vnodes created by successful
calls to getnewvnode().
kib [Tue, 31 Mar 2015 00:57:25 +0000 (00:57 +0000)]
MFC r280435:
When mapping an allocated entry, use the entry size, instead of the
requested size. If tag restrictions caused split entry, its size is
less then requsted.
jhb [Mon, 30 Mar 2015 16:28:04 +0000 (16:28 +0000)]
Revert accidental(?) change in r280455 and do not compile hwpmc statically
into GENERIC by default. This change is not present in HEAD and was not
made in the two commits to HEAD that r280455 merged.
mav [Mon, 30 Mar 2015 07:11:49 +0000 (07:11 +0000)]
MFC r280134:
Report ARAT (APIC-Timer-always-running) feature for virtual CPU.
This makes FreeBSD guest to not avoid using LAPIC timer, preferring HPET
due to worries about non-existing for virtual CPUs deep sleep states.
Benchmarks of usleep(1) on guest and host show such extra latencies:
- 51us for virtual HPET,
- 22us for virtual LAPIC timer,
- 22us for host HPET and
- 3us for host LAPIC timer.
mav [Fri, 27 Mar 2015 08:53:59 +0000 (08:53 +0000)]
MFC r280037:
Rewrite virtio block device driver to work asynchronously and use the block
I/O interface.
Asynchronous operation, based on r280026 change, allows to not block virtual
CPU during I/O processing, that on slow/busy storage can take seconds.
Use of recently improved block I/O interface allows to process multiple
requests same time, that improves random I/O performance on wide storages.
Benchmarks of virtual disk, backed by ZVOL on RAID10 pool of 4 HDDs, show
~3.5 times random read performance improvements, while no degradation on
linear I/O. Guest CPU usage during test dropped from 100% to almost zero.
mav [Fri, 27 Mar 2015 08:52:57 +0000 (08:52 +0000)]
MFC r280026, r280041:
Modify virtqueue helpers added in r253440 to allow queuing.
Original virtqueue design allows queued and out-of-order processing, but
helpers added in r253440 suppose only direct blocking in-order one.
It could be fine for network, etc., but it is a huge limitation for storage
devices.
mav [Fri, 27 Mar 2015 08:51:20 +0000 (08:51 +0000)]
MFC r280004: Give block I/O interface multiple (8) execution threads.
On parallel random I/O this allows better utilize wide storage pools.
To not confuse prefetcher on linear I/O, consecutive requests are executed
sequentially, following the same logic as was earlier implemented in CTL.
Benchmarks of virtual AHCI disk, backed by ZVOL on RAID10 pool of 4 HDDs,
show ~3.5 times random read performance improvements, while no degradation
on linear I/O.