cxgbe(4): Fix a couple of problems in the sge_wrq data path.
- start_wrq_wr must not drain the wr_list if there are incomplete_wrs
pending. This can happen when a t4_wrq_tx runs between two
start_wrq_wr.
- commit_wrq_wr must examine the cookie's pidx and ndesc with the
queue's lock held. Otherwise there is a bad race when incomplete WRs
are being completed and commit_wrq_wr for the WR that is ahead in the
queue updates the next incomplete WR's cookie's pidx/ndesc but the
commit_wrq_wr for the second one is using stale values that it read
without the lock.
gordon [Sat, 9 Sep 2017 03:09:02 +0000 (03:09 +0000)]
The purge option hasn't been implemented since 1994 when we imported this
code. I think it is safe to say it's not going to be. I'm also working to
de-orbit catman, so remove the reference in the manpage.
Add P5021 and P5040 conditions for LAW count check.
P5040/P5021 have the same number of LAWs as P5020. There may be a better way of
getting the count from the FDT (fsl,num-laws property on soc/corenet-law or
soc/ecm-law), but that's not supported everywhere, so we still need this check
for those other cases.
These processors may not be supported yet, but add them for completion.
POWER9 is planned for support. e300 may work (based on 603e core).
P5040/P5021 are similar to P5020, so should work as well. One addition is
needed for P5040, to support the number of LAWs, and will be a separate commit.
cem [Sat, 9 Sep 2017 01:41:01 +0000 (01:41 +0000)]
Fix information leak in geli(8) integrity mode
In integrity mode, a larger logical sector (e.g., 4096 bytes) spans several
physical sectors (e.g., 512 bytes) on the backing device. Due to hash
overhead, a 4096 byte logical sector takes 8.5625 512-byte physical sectors.
This means that only 288 bytes (256 data + 32 hash) of the last 512 byte
sector are used.
The memory allocation used to store the encrypted data to be written to the
physical sectors comes from malloc(9) and does not use M_ZERO.
Previously, nothing initialized the final physical sector backing each
logical sector, aside from the hash + encrypted data portion. So 224 bytes
of kernel heap memory was leaked to every block :-(.
This patch addresses the issue by initializing the trailing portion of the
physical sector in every logical sector to zeros before use. A much simpler
but higher overhead fix would be to tag the entire allocation M_ZERO.
Enhance qpi.c to make it usable on all Core-microarchitecture Xeons.
Scan all buses for CSR bus, not stopping on the first failed
match. Scan all slots for function 0 on the found bus, for instance on
IvyBridge the slot 0 is not decoded at all. Since the scan is quite
unsafe, and access to the buses is mostly useful for developers,
enable the csr buses scan with the tunable.
Current qpi.c makes too many assumptions about the uncore
configuration buses location and about slots occupied. Also it
restricts itself only to Nehalem CPUs. It is needed on all Core-based
Xeons. On the 2600 v2 (IvyBridge) machine I have access to, the CSR
buses have numbers 31 (BSP socket) and 63 (second socket), and there
is no functions pci0.31.0.0 or pci0.63.0.0. According to the CPU
datasheet, all devices on the uncore bus occupy slots >= 8.
Practically, the attach to config buses is required for the intel-pcm
pcm-memory.x tool to work, for instance.
Use IOAPIC PCI rid as the interrupt TLP source id for DMAR interrupt
remapping.
VT-d specification requires use of PCI rid as source id for IOAPICs
enumerated by PCI bus. The values from the DMAR ACPI table should be
only used when IOAPIC is not on PCI.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Hardware provided by: Intel
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12205
Add an ioapic_get_rid() function to obtain PCIe TLP requester-id for
the interrupt messages from given IOAPIC, if the IOAPIC can be
enumerated on PCI bus.
If IOAPIC has PCI binding, match the PCI device against MADT
enumerated IOAPIC. Match is done first by registers window physical
address, then by IOAPIC ID as read from the APIC ID register.
PCI bsf address of the matched PCI device is the rid.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Hardware provided by: Intel
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D12205
As with r323317, hold off on releasing the intrhook during boot until
we're ready to accept probing from GEOM. Untested, but the pattern is
the same as with aac.
Move the intrhook release to later in the function so that GEOM knows to wait longer
for possible root devices to come online. This fixes a race that seems to be
triggered by EARLY_AP_STARTUP.
cem [Fri, 8 Sep 2017 15:08:17 +0000 (15:08 +0000)]
Audit userspace geom code for leaking memory to disk
Any geom class using g_metadata_store, as well as geom_virstor which
duplicated g_metadata_store internally, would dump sectorsize - mdsize bytes
of userspace memory following the metadata block stored. This is most or all
geom classes (gcache, gconcat, geli, gjournal, glabel, gmirror, gmultipath,
graid3, gshsec, gstripe, and geom_virstor).
We currently initialize the vm_page array in three passes: one to zero
the array, one to initialize the "order" field of each page (necessary
when inserting them into the vm_phys buddy allocator one-by-one), and
one to initialize the remaining non-zero fields and individually insert
each page into the allocator.
Merge the three passes into one following a suggestion from alc:
initialize vm_page fields in a single pass, and use vm_phys_free_contig()
to efficiently insert physical memory segments into the buddy allocator.
This reduces the initialization time to a third or a quarter of what it
was before on most systems that I tested.
cem [Thu, 7 Sep 2017 21:31:07 +0000 (21:31 +0000)]
x86 MCA: Enable AMD thresholding support on 17h
17h supports MCA thresholding in the same way as 16h and earlier.
Supposedly a ScalableMca feature bit in CPUID 8000_0007:EBX must be set, but
that was not true for earlier models, so be careful about relying on it.
While here, document a missing bit in LS MCA MISC0.
Add basic tests for chflags, mkdir, rcp, and rmdir
Add basic command line parsing test coverage for these utilities. The tests
were automatically generated based on their man pages. These tests can be
expanded by hand for more thorough coverage. The aim is to generate very
basic amount of test coverage for all the utilities in the base system.
andrew [Thu, 7 Sep 2017 15:45:56 +0000 (15:45 +0000)]
Add the ARMv8.2 ID register additions and use them to decode the register
values. As not all assemblers understand the new ID_AA64MMFR2_EL1 register
add a macro to access it. This seems to be safe for older CPUs to read this
new register, with them returning zero.
We need to extend the -Wno-format hack to yet another Makefile to cope
with %S meaning (CHAR16 *) not (wchar_t *) in the context of the EFI
boot loaders.
Split out asciidump, utf8dump, bindump, and hexdump into a separate
file efiutil.c. Implement new efi_print_load_option for printing out
the EFI_LOADER_OPTION data structure used to specify different options
to the UEFI boot manager.
efidp_size will return the size, in bytes, of a EFI device path
structure. This is a convenience wrapper in the same style as the
other linux routines. It's implemented by GetDevicePathSize from EDK2
we already needed for other things.
Rename boot1's wcslen to ucs2len, which we can't use in userland
because wchar in userland is unsigned, not short. Move it into
efichar.c. Also spell '* 2' as '* sizeof(efi_char)' and add 1 for the
trailing NUL to transition the FreeBSD boot env vars to being NUL
terminated on the same line...
cem [Thu, 7 Sep 2017 07:24:22 +0000 (07:24 +0000)]
cam(4): Fix some warnings
When bcopy is treated as memcpy/memmove, Clang produces warnings that the
size argument doesn't match the type of the source. This is true, it
doesn't match; we're aliasing the source.
Explicitly cast the source pointer to the expected type to remove the
warning.
In the recvmsg32() system call iterate over returned structure(s)
and convert any messages of types SCM_BINTIME, SCM_TIMESTAMP,
SCM_REALTIME and SCM_MONOTONIC from 64-bit to its 32-bit
representation. Otherwise we either run out of user-supplied
buffer to copy those out resulting in the MSG_CTRUNC or simply
return values that the userland 32-bit code is not going
to parse correctly. This fixes at least two regression tests
failing to function properly in 32-bit compat mode:
dab [Thu, 7 Sep 2017 00:20:17 +0000 (00:20 +0000)]
Add a new getty/gettytab capability to generate an initial message dynamically.
This modification adds a new gettytab(5) option (iM) to specify a
program to run that will generate the initial (banner) message that is
displayed before the login prompt. Such a capability is useful when
dynamic information is needed in the banner message that cannot be
supplied by the set of % substitution sequences available in the "im"
option.
dim [Wed, 6 Sep 2017 21:21:13 +0000 (21:21 +0000)]
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
5.0.0 release (upstream r312559).
Release notes for llvm, clang and lld will be available here soon:
<http://releases.llvm.org/5.0.0/docs/ReleaseNotes.html>
<http://releases.llvm.org/5.0.0/tools/clang/docs/ReleaseNotes.html>
<http://releases.llvm.org/5.0.0/tools/lld/docs/ReleaseNotes.html>
While __read_mostly groups variables together, their placement is not
specified. In particular 2 frequently used variables can end up in
different lines.
This annotation is only expected to be used for variables read all the time,
e.g. on each syscall entry.
Start annotating global _padalign locks with __exclusive_cache_line
While these locks are guarnteed to not share their respective cache lines,
their current placement leaves unnecessary holes in lines which preceeded them.
For instance the annotation of vm_page_queue_free_mtx allows 2 neighbour
cachelines (previously separate by the lock) to be collapsed into 1.
The annotation is only effective on architectures which have it implemented in
their linker script (currently only amd64). Thus locks are not converted to
their not-padaligned variants as to not affect the rest.
bnxt: Use correct firmware call for number of queues supported
1) Based on the suggestion from firmware team, derive
scctx->isc_ntxqsets_max & scctx->isc_nrxqsets_max based on FUNC_QCFG
(instead of FUNC_QCAPS).
2) Bump-up driver version to "1.0.0.2".
In swp_pager_meta_build(), if the requested operation results in
freeing the last swap pointer in the swblk, free the trie node. Other
swap pager code does not expect to find completely empty swblk.
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Reflect realtime and idle priorities in ps(1) state flags, same like
we do for the usual nice values. It could be argued that they should
use another set of indicators, since the underlying mechanism is
different, but they match the description in the manual page, and so
I think it's ok to not overcomplicate things.
Add support for generic backpressure indicator for ratelimited
transmit queues aswell as non-ratelimited ones.
Add the required structure bits in order to support a backpressure
indication with ratelimited connections aswell as non-ratelimited
ones. The backpressure indicator is a value between zero and 65535
inclusivly, indicating if the destination transmit queue is empty or
full respectivly. Applications can use this value as a decision point
for when to stop transmitting data to avoid endless ENOBUFS error
codes upon transmitting an mbuf. This indicator is also useful to
reduce the latency for ratelimited queues.
Enable dtrace support for mips64 and the ERL kernel config
Turn on the required options in the ERL config file, and ensure
that the fbt module is listed as a dependency for mips in
the modules/dtrace/dtraceall/dtraceall.c file.
"zfs mount -o" passes a list of mount options directly to nmount(2) after
sanity checking them. In particular, zfs(8) will refuse to mount an already
existing file system unless "remount" is specified in the option list.
However, the "remount" option only exists in Illumos. FreeBSD's equivalent is
"update".
The existing code in zmount incorrectly parses the comma-delimited option
string. The result is that nmount only honors the last option. AFAICT the
parsing has been broken ever since ZFS's initial import in change 168404.
Enable the in-tree binutils to assemble and disassemble amd64 FSGSBASE
instructions (rdfsbase, rdgsbase, wrfsbase, wrgsbase), used in the base
system since r322763.
This gives one last gasp for in-tree gcc, and provides a small
enhancement for in-tree binutils objdump.
cem [Tue, 5 Sep 2017 15:19:14 +0000 (15:19 +0000)]
amdtemp(4): Add support for Family 17h temperature sensor
The sensor value is formatted similarly to previous models (same
bitfield sizes, same units), but must be read off of the internal
System Management Network (SMN) from the System Management Unit (SMU)
co-processor.
PR: 218264
Reported and tested by: Nils Beyer <nbe AT renzel.net>
Reviewed by: avg (no +1), mjoras, truckman
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12217
cem [Tue, 5 Sep 2017 15:13:41 +0000 (15:13 +0000)]
Add smn(4) driver for AMD System Management Network
AMD Family 17h CPUs have an internal network used to communicate between
the host CPU and the PSP and SMU coprocessors. It exposes a simple
32-bit register space.
Make root_mount_rel(9) ignore NULL arguments, like it used to before r313351.
It would be better to fix API consumers to not pass NULL there - most of them,
such as gmirror, already contain the neccessary checks - but this is easier
and much less error-prone.
One known user-visible result is that it fixes panic on a failed "graid label".
Now that CloudABI's sockets API has been changed to be addressless and
only connected socket instances are used (e.g., socket pairs), they have
become fairly similar to pipes. The only differences on CloudABI is that
socket pairs additionally support shutdown(), send() and recv().
To simplify the ABI, we've therefore decided to remove pipes as a
separate file descriptor type and just let pipe() return a socket pair
of type SOCK_STREAM. S_ISFIFO() and S_ISSOCK() are now defined
identically.
Fix loader bug causing too many pages allocation when bootloader is U-Boot
FreeBSD loader expects to have mmsz variable set by bootloader.
U-Boot behaviour is that if buffer size is not big enough to keep
whole memory map, assign the smallest correct buffer size to sz
and return error.
In other words U-Boot assumes that nobody will need mmsz value when buffer
is not filled with memory map, which is not true, so calculated pages value
was too big to allocate.
Two modules with the same name cannot be loaded, so Marvell specific drivers
cannot have the same name as generic drivers.
Files with the same name, even in different folder overlaps their .o files.
Change armada38x/rtc.c to armada38x/armada38x_rtc.c fix it.
Preparation for adding this driver to GENERIC config for ARMv7
Marvell platforms.