Alexander Motin [Thu, 20 Aug 2020 02:54:44 +0000 (02:54 +0000)]
MFC r364185: Fill device serial_num and device_id in NVMe XPT.
It allows to report GEOM::lunid for nda(4) same as for nvd(4). Since
NVMe now allows multiple LUs (namespaces) with multiple paths unique
LU identification is important. The serial_num field is filled same
as before with the controller serial number, while device_id is based
on namespace GUID and/or EUI64 fields as recommended by "NVM Express:
SCSI Translation Reference" and matching nvd(4) at the end.
MFC r364109:
Need to clone the task struct fields related to RCU aswell in the
LinuxKPI after r359727. This fixes a minor regression issue. Else the
priority tracking won't work properly when both sleepable and
non-sleepable RCU is in use on the same thread.
Bump the __FreeBSD_version to force recompilation of external kernel
modules.
MFC r359438 and r359477:
Remove the "config" taskqgroup and its KPIs.
Equivalent functionality is already provided by taskqueue(9), just use
that instead.
Interfaces may be detached from a taskqueue_thread task, for example by
prison_complete(), so after r359438, when draining the queue we may end
up deadlocking.
MFC r364028:
Implement radix_tree_store() in the LinuxKPI for use with the coming
extensible arrays implementation.
While at it add some more comments explaining the current
radix_tree_insert() function and make sure to clean the root node when
the radix tree reaches the maximum height. This can happen if the
index passed is too big when the tree is empty.
The radix_tree_store() function is basically a copy of the
radix_tree_insert() function with some added functionality.
The radix_tree_store() function is local to FreeBSD and does not yet
exist in Linux.
MFC r364019:
Add full support support for dynamic allocation and freeing of epoch's.
Make sure to reclaim epoch structures when they are freed to support
dynamic allocation and freeing of epoch structures.
While at it, move the 64 supported epoch control structures to the
static memory domain. This overall simplifies the management and
debugging of system epoch's.
MFC r363397: Fix style and comment around concave/convex regions in TCP cubic.
In cubic, the concave region is when snd_cwnd starts growing slower
towards max_cwnd (cwnd at the time of the congestion event), and
the convex region is when snd_cwnd starts to grow faster and
eventually appearing like slow-start like growth.
MFC r363380: Add MODULE_VERSION to TCP loadable congestion control modules.
Without versioning information, using preexisting loader /
linker code is not easily possible when another module may
have dependencies on pre-loaded modules, and also doesn't
allow the automatic loading of dependent modules.
Cy Schubert [Tue, 18 Aug 2020 20:41:03 +0000 (20:41 +0000)]
MFC r364133:
When booting a system with WITHOUT_IPFILTER the following errors
are encountered at boot time:
rcorder: requirement `ipfs' in file `/etc/rc.d/netif' has no providers.
rcorder: requirement `ipfilter' in file `/etc/rc.d/netif' has no
providers.
rcorder: requirement `ipfilter' in file `/etc/rc.d/netwait' has no
providers.
rcorder: requirement `ipfilter' in file `/etc/rc.d/net_watchdog' has no
providers.
rcorder: requirement `ipfilter' in file `/etc/rc.d/securelevel' has no
providers.
Listing its own requrements in BEFORE rather than use REQUIRE of
non-optional scripts resolves this issue.
The issue was discovered and patched by glebius at Netflix.
Peter Grehan [Tue, 18 Aug 2020 03:40:09 +0000 (03:40 +0000)]
MFC 363596, 363719, 363733
363596 Support the setting of additional AHCI controller parameters.
363719 Definition for the 'removable media flag' from word 0 in the Identify page.
363733 Replace magic numbers in Identify page register 0 with ATA definitions.
A number of components require OpenSSL and fail to build if it is not
enabled. As a first phase force these off under WITHOUT_OPENSSL. A
second phase should make these more fine-grained, allowing the component
to build but without OpenSSL.
Fix bogomips calculation. Previously it was off by half. This was
verified under VMWare Fusion, comparing to what's reported under CentOS,
and by comparing numbers reported by linuxulator on T420 with a googled
up Linux cpuinfo (https://lkml.org/lkml/2011/11/29/116).
Make linprocfs(5) report correct tty number in /proc/<PID>/stat.
Fixes sudo (sudo-1.8.21p2-3ubuntu1.2); previously would fail
with "sudo: no tty present and no askpass program specified".
Make linprocfs(5) create /proc/bus/pci/devices/, and linsysfs(5)
create /sys/class/power_supply/. This silences some warnings
from biology/linux-foldingathome.
Reported by: 0mp
Sponsored by: The FreeBSD Foundation
Alexander Motin [Thu, 13 Aug 2020 00:42:09 +0000 (00:42 +0000)]
MFC r363979: Add CTL support for REPORT IDENTIFYING INFORMATION command.
It allows to report to initiator LU identifying information, preset via
"ident_info" and "text_ident_info" options.
Unfortunately it is impossible to implement SET IDENTIFYING INFORMATION,
since we have no persistent storage it requires, so the information is
read-only for initiator and has to be set out-of-band.
MFC r363888:
Handle delayed checksums if needed in NAT64.
Upper level protocols defer checksums calculation in hope we have
checksums offloading in a network card. CSUM_DELAY_DATA flag is used
to determine that checksum calculation was deferred. And IP output
routine checks for this flag before pass mbuf to lower layer. Forwarded
packets have not this flag.
NAT64 uses checksums adjustment when it translates IP headers.
In most cases NAT64 is used for forwarded packets, but in case when it
handles locally originated packets we need to finish checksum calculation
that was deferred to correctly adjust it.
Add check for presence of CSUM_DELAY_DATA flag and finish checksum
calculation before adjustment.
Gordon Bergling [Wed, 12 Aug 2020 07:00:06 +0000 (07:00 +0000)]
MFC r363907: environ(7): Update the description and include some more environment variables
- Add a better introduction to the DESCRIPTION section
- Add a description for MANPATH and POSIXLY_CORRECT
- Asorted improvements for the usage of some macros
Eugene Grosbein [Tue, 11 Aug 2020 11:32:44 +0000 (11:32 +0000)]
MFC r363064,363067,363081,363136: optimize install(1) a bit.
Currently, "install -s -S" behaviour is inefficient for upgrade.
First it finds that destination file already exists and copies
source file to temporary file. Then it calls strip(1)
with name of temporary file as single agrument and our strip(1) creates
another temporary file in the /tmp (or TMPDIR) making another copy
that is finally copied to DESTDIR third time.
Meantime, strip(1) has an option "-o dst" to specify destination
so install(1) is allowed to skip initial copying from obj to DESTDIR.
This change makes it do so.
This optimization descreases total amount of data sent to
both of /tmp and DESTDIR during "make installworld" by 32% or so.
See the differential for details.
Gordon Bergling [Tue, 11 Aug 2020 10:56:44 +0000 (10:56 +0000)]
MFC r363829: directory(3): Add an ERRORS section
- Add an ERRORS section for opendir(3) and closedir(3)
- Document also the errors of readdir(3), readdir_r(3) and telldir(3)
- Convert the code sample into an EXAMPLES section
Rick Macklem [Tue, 11 Aug 2020 05:10:01 +0000 (05:10 +0000)]
MFC: r363210
Fix the pNFS flexible file layout client for servers with small write size.
The code in nfscl_dofflayout() loops when a flexible file layout server
provides a small write data limit (no extant server is known to do this).
If/when it looped, it erroneously reused the "drpc" argument for the
mirror worker thread, corrupting it.
This patch fixes the problem by only using the calling thread after the
first loop iteration.
Found during testing by simulating a server with a small write size.
Since no extant pNFS server is known to provide a small write size,
this fix it not needed in practice at this time.
Alexander Motin [Tue, 11 Aug 2020 00:41:48 +0000 (00:41 +0000)]
MFC r363852: Remove extra memset() left after r342388.
This memset() wiped MPI2_FUNCTION_SCSI_TASK_MGMT set by mprsas_alloc_tm(),
that broke target reset on device removal, making later re-insertion into
the same slot impossible, since firmware was still waiting for the driver
to finish with the removed device.
iflib: initialize netmap with the correct number of descriptors
In case the network device has a RX or TX control queue, the correct
number of TX/RX descriptors is contained in the second entry of the
isc_ntxd (or isc_nrxd) array, rather than in the first entry.
This case is correctly handled by iflib_device_register() and
iflib_pseudo_register(), but not by iflib_netmap_attach().
If the first entry is larger than the second, this can result in a
panic. This change fixes the bug by introducing two helper functions
that also lead to some code simplification.
Dimitry Andric [Mon, 10 Aug 2020 17:35:58 +0000 (17:35 +0000)]
MFC r363988:
Fix clang 11 -Wformat warnings in yp_mkdb:
usr.sbin/yp_mkdb/yp_mkdb.c:91:40: error: format specifies type 'char *' but the argument has type 'void *' [-Werror,-Wformat]
printf("%.*s %.*s\n", (int)key.size, key.data, (int)data.size,
~~~~ ^~~~~~~~
usr.sbin/yp_mkdb/yp_mkdb.c:92:7: error: format specifies type 'char *' but the argument has type 'void *' [-Werror,-Wformat]
data.data);
^~~~~~~~~
Import new 2-clause BSD licenced implementation of the bc and dc commands
These implementations of the bc and dc programs offer a number of advantages
compared to the current implementations in the FreeBSD base system:
- They do not depend on external large number functions (i.e. no dependency
on OpenSSL or any other large-number library)
- They implements all features found in GNU bc/dc (with the exception of
the forking of sub-processes in dc, which the author of this version
considers a security issue).
- They are significantly faster than the current code in base (more than
2 orders of magnitude in some of my tests, e.g. for 12345^100000).
- They should be fully compatible with all features and the behavior of the
current implementations in FreeBSD (not formally verified).
- They support POSIX message catalogs and come with localized messages in
Chinese, Dutch, English, French, German, Japanese, Polish, Portuguese,
and Russian.
- They offer very detailed man-pages that provide far more information than
the current ones.
This code is not built and installed in FreeBSD-12, unless WITH_GH_BC=yes
is set (e.g. in src.conf) during builworld and installworld. It has been
imported and made the default in FreeBSD-CURRENT, and all improvements and
changes requested for the version in -CURRENT are included in this initial
import into a stable branch.
The only thing this tunable enables now is reporting to ACPI _OSC that
Active State Power Management and Clock Power Management Capability are
"supported" by the OS.
I've found that at least some Supermicro server boards do not allow OS
to support native PCIe hot-plug unless it reports those capabilities.
After spending significant time in PCIe specs I have found very little
motivation for that, and none of it applies to those motherboards, not
enabling ASPM themselves. So unless OS explicitly wants to save power,
I see nothing for it to do there actually.
I guess it may get sense to support ASPM when we get Thunderbolt support.
Otherwise I have no system with PCIe hot-plug where power saving matters.
It would be nice to enable this by default, but I worry that it affect
power saving of some laptops, even though I haven't noticed that myself.
Gordon Bergling [Fri, 7 Aug 2020 15:03:20 +0000 (15:03 +0000)]
MFC r363291: devstat(9): Update the man page to reflect the current implementation
- Rename devstat_add_entry to devstat_new_entry
- Update the description of devstat_trans_flags
- Add manpage aliases for devstat_start_transaction_bio and devstat_end_transaction_bio
Alexander Motin [Fri, 7 Aug 2020 01:05:10 +0000 (01:05 +0000)]
MFC r363624: Add initial driver for ACPI Platform Error Interfaces.
APEI allows platform to report different kinds of errors to OS in several
ways. We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI. Without respective
driver it ended up in kernel panic without any additional information.
This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling. It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise. Errors
marked as fatal still end up in kernel panic, but some more informative.
When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code. Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.
Alexander Motin [Fri, 7 Aug 2020 00:56:20 +0000 (00:56 +0000)]
MFC r360328 (by vangyzen):
Fix handling of NMIs from unknown sources (BMC, hypervisor)
Release kernels have no KDB backends enabled, so they discard an NMI
if it is not due to a hardware failure. This includes NMIs from
IPMI BMCs and hypervisors.
Furthermore, the interaction of panic_on_nmi, kdb_on_nmi, and
debugger_on_panic is confusing.
Respond to all NMIs according to panic_on_nmi and debugger_on_panic.
Remove kdb_on_nmi. Expand the meaning of panic_on_nmi by making
it a bitfield. There are currently two bits: one for NMIs due to
hardware failure, and one for all others. Leave room for more.
If panic_on_nmi and debugger_on_panic are both true, don't actually panic,
but directly enter the debugger, to allow someone to leave the debugger
and [hopefully] resume normal execution.
Relnotes: yes: machdep.kdb_on_nmi is gone; machdep.panic_on_nmi changed
Alexander Motin [Fri, 7 Aug 2020 00:40:28 +0000 (00:40 +0000)]
MFC r363527: Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY.
Alexander Motin [Fri, 7 Aug 2020 00:26:16 +0000 (00:26 +0000)]
MFC r363490: Make lapic_ipi_vectored(APIC_IPI_DEST_SELF) NMI safe.
Sending IPI to self or all CPUs does not require write into upper part of
the ICR, prone to races. Previously the code disabled interrupts, but it
was not enough for NMIs. Instead of that when possible write only lower
part of the register, or use special SELF IPI register in x2APIC mode.
This also removes ICR reads used to preserve reserved bits on write.
It was there from the beginning, but I failed to find explanation why,
neither I see Linux doing it. Specification even tells that ICR content
may be lost in deep C-states, so if hardware does not bother to preserve
it, why should we?