CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

MFC r339472: rc.initdiskless: fix commentary grammar after r339465

MFC r339465: rc.initdiskless: add support for auxiliary NVRAM.

  Currently, rc.inidiskless assumes that local system configuration
  changes are kept in some mountable file system. For example,
  nanobsd uses dedicated partition mounted as /cfg for this.

  However, small embedded devices like MIPS routers may have no enough flash
  space to keep full-blown file system but have only one or couple
  small flash blocks to keep persistent local configuration overrides.

  This change extends rc.initdiskless and introduces ability to run auxiliary
  command /conf/T/M/extract that is supposed to extract configuration overrides
  from such local storage.

  For example, the command /conf/default/etc/extract may contain something like:

  cd "$1" && bsdcpio --quiet -idu < /dev/map/cfg

  bsdcpio command extracts compressed archive from the storage to /etc
  assuming the storage is exposed by the kernel as /dev/map/cfg to userland.

PR: 204215

MFC r340487:
Align IA32_ARCH_CAP MSR definitions and use with SDM rev. 068.

MFC: r339999
Fix NFS client vnode locking to avoid a crash during forced dismount.

A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.

PR: 232673

MFC r340299: Octeon SDK: avoid use of uninitialized variable

Reported by: Clang

MFC r340288: nvi: remove superfluous space before ^\

This fixes alignment in vi's 'viusage' command and has been fixed
upstream and in OpenBSD.

Submitted by: Raf Czlonka (github:rjc)

MFC r340329: build(7): clarify buildenv target can be used for non-cross builds

make buildenv can be used for building for the same architecture as
the host (perhaps this is a degenerate case of cross-building).
TARGET and TARGET_ARCH do not need to be set in this case.

Sponsored by: The FreeBSD Foundation

MFC r340072:

pfsync: Add missing unlock

If we fail to set up the multicast entry for pfsync and return an error
we must release the pfsync lock first.

Sponsored by: Orange Business Services

MFC r340070:

pfsync: Allow module to be unloaded

Sponsored by: Orange Business Services

MFC r340068:

pfsync: Handle syncdev going away

If the syncdev is removed we no longer need to clean up the multicast
entry we've got set up for that device.

Pass the ifnet detach event through pf to pfsync, and remove our
multicast handle, and mark us as no longer having a syncdev.

Note that this callback is always installed, even if the pfsync
interface is disabled (and thus it's not a per-vnet callback pointer).

Sponsored by: Orange Business Services

MFC r340067:

pfsync: Ensure uninit is done before pf

pfsync touches pf memory (for pf_state and the pfsync callback
pointers), not the other way around. We need to ensure that pfsync is
torn down before pf.

Sponsored by: Orange Business Services

MFC r340066:

Notify that the ifnet will go away, even on vnet shutdown

pf subscribes to ifnet_departure_event events, so it can clean up the
ifg_pf_kif and if_pf_kif pointers in the ifnet.
During vnet shutdown interfaces could go away without sending the event,
so pf ends up cleaning these up as part of its shutdown sequence, which
happens after the ifnet has already been freed.

Send the ifnet_departure_event during vnet shutdown, allowing pf to
clean up correctly.

Sponsored by: Orange Business Services

MFC r340065:

pfsync: Make pfsync callbacks per-vnet

The callbacks are installed and removed depending on the state of the
pfsync device, which is per-vnet. The callbacks must also be per-vnet.

Sponsored by: Orange Business Services

MFC r339676:

pf: Fix copy/paste error in IPv6 address rewriting

We checked the destination address, but replaced the source address. This was
fixed in OpenBSD as part of their NAT rework, which we don't want to import
right now.

CID: 1009561

MFC r339578:

pfctl: Fix line numbers when \ is used inside ""

PR: 201520
Obtained from: OpenBSD

MFC r339470:

pf synproxy will do the 3WHS on behalf of the target machine, and once
the 3WHS is completed, establish the backend connection. The trigger
for "3WHS completed" is the reception of the first ACK. However, we
should not proceed if that ACK also has RST or FIN set.

PR: 197484
Obtained from: OpenBSD

MFC r339897:
Remove rtld use of libc amd64_set_fsbase().

MFC r340136:
Move the fixed base for PIE loading on arm.

MFC r339464:

pfctl: Dup strings

When we set the ifname we have to copy the string, rather than just keep
the pointer.

PR: 231323

MFC 339312,339364: Restore more descriptors during VM exits.

339312:
Fully restore the GDTR, IDTR, and LDTR after VT-x VM exits.

The VT-x VMCS only stores the base address of the GDTR and IDTR.  As a
result, VM exits use a fixed limit of 0xffff for the host GDTR and
IDTR losing the smaller limits set in when the initial GDT is loaded
on each CPU during boot.  Explicitly save and restore the full GDTR
and IDTR contents around VM entries and exits to restore the correct
limit.

Similarly, explicitly save and restore the LDT selector.  VM exits
always clear the host LDTR as if the LDT was loaded with a NULL
selector and a userspace hypervisor is probably using a NULL selector
anyway, but save and restore the LDT explicitly just to be safe.

339364:
Reload the LDT selector after an AMD-v #VMEXIT.

cpu_switch() always reloads the LDT, so this can only affect the
hypervisor process itself.  Fix this by explicitly reloading the host
LDT selector after each #VMEXIT.  The stock bhyve process on FreeBSD
never uses a custom LDT, so this change is cosmetic.

PR: 230773

Revert r340541. It requires VNET_DEFINE_STATIC() macro that is not yet
merged into stable/11.

MFC r339544:
  Call inet_ntop() only when its result is needed.

  Obtained from: Yandex LLC
  Sponsored by: Yandex LLC

MFC r339542:
  Retire IPFIREWALL_NAT64_DIRECT_OUTPUT kernel option. And add ability
  to switch the output method in run-time. Also document some sysctl
  variables that can by changed for NAT64 module.

  NAT64 had compile time option IPFIREWALL_NAT64_DIRECT_OUTPUT to use
  if_output directly from nat64 module. By default is used netisr based
  output method. Now both methods can be used, but they require different
  handling by rules.

  Obtained from: Yandex LLC
  Sponsored by: Yandex LLC
  Differential Revision: https://reviews.freebsd.org/D16647

MFC r339533:
  Add sadb_x_sa2 extension to SADB_ACQUIRE requests.

  SADB_ACQUIRE requests are send by kernel, when security policy doesn't
  have corresponding security association for outbound packet. IKE daemon
  usually registers its handler for such messages and when the kernel asks
  for SA it can handle this request. Now such requests will contain
  additional fields that can help IKE daemon to create SA. And IKE now
  can create SAs using only information from SADB_ACQUIRE request, this
  is useful when many if_ipsec(4) interfaces are in use and IKE doesn track
  security policies that was installed by kernel.

  Obtained from: Yandex LLC
  Sponsored by: Yandex LLC

MFC r339539:
  Add IPFW_RULE_JUSTOPTS flag, that is used by ipfw(8) to mark rule,
  that was added using "new rule format". And then, when the kernel
  returns rule with this flag, ipfw(8) can correctly show it.

  Reported by: lev
  Sponsored by: Yandex LLC
  Differential Revision: https://reviews.freebsd.org/D17373

MFC r339545:
  Do not decrement RST life time if keep_alive is not turned on.

  This allows use differen values configured by user for sysctl variable
  net.inet.ip.fw.dyn_rst_lifetime.

  Obtained from: Yandex LLC
  Sponsored by: Yandex LLC

MFC r339535:
  Do not allow use `create` keyword as hostname when ifconfig(8) is invoked
  for already existing interface.

  It appeared, that ifconfig(8) assumes `create` keyword as hostname and
  tries to resolve it, when `ifconfig ifname create` invoked for already
  existing interface. This can produce some unexpected results, when hostname
  resolving has successfully happened. This patch adds check for such case.
  When an interface is already exists, and create is only one argument,
  return error message. But when there are some other arguments, just remove
  create keyword from the arguments list.

  Obtained from: Yandex LLC
  Sponsored by: Yandex LLC
  Differential Revision: https://reviews.freebsd.org/D17171

MFC r339536:
  Fix grammar.

MFC 338511: bhyve: Use MAP_GUARD when mapping guest memory ranges.

Instead of relying on PROT_NONE mappings with MAP_ANON, use MAP_GUARD
to reserve address space around guest memory ranges including the
guard ranges of address space around mappings.

MFC r338977:

Add description, parameters, options, sysctl and examples of using AQMs to ipfw man page. CoDel, PIE, FQ-CoDel and FQ-PIE AQM for Dummynet exist in FreeBSD 11 and 10.3.

Submitted by: ralsaadi@swin.edu.au
Reviewed by: AllanJude
Differential Revision: https://reviews.freebsd.org/D12507

Fix a regression from prior to 11.2 that caused MSI (not MSI-X) interrupt
allocation to fail. While here, refactor the code so that it's more clear
and less likely to break in the future. This is not an MFC due to the code
in 12/head being very different, but it follows the latter's structure
more closely than before.

Reported by: Harry Schmalzbauer

MFC r340251:

  Update rum(4) and run(4) man pages to reflect that newer versions
  of TP-LINK TL-WN321G are run(4) and not rum(4) anymore.

  Reported by: J (tech-lists zyxst.net)

MFC r340248:
Don't read the USB audio sync endpoint when we don't use it to save
isochronous bandwidth.

Sponsored by: Mellanox Technologies

MFC r340249: ipfw.8: fix small syntax error in an example

Fix dtb path for beaglebone* boards.

This is a direct commit to 11 since head switch to Linux upstream DTS.

Reported by: jmg

MFC r338485 (jhb): libelf: Add gelf_mips64el.c to file list

MFC r340212:
Sometimes the complete split packet may be queued too early and the
transaction translator will return a NAK. Ignore this message and
retry the complete split instead.

Sponsored by: Mellanox Technologies

Fix objcopy for little-endian MIPS64 objects.

MFC r338478 (jhb): Fix objcopy for little-endian MIPS64 objects.

MIPS64 does not store the 'r_info' field of a relocation table entry as
a 64-bit value consisting of a 32-bit symbol index in the high 32 bits
and a 32-bit type in the low 32 bits as on other architectures.  Instead,
the 64-bit 'r_info' field is really a 32-bit symbol index followed by four
individual byte type fields.  For big-endian MIPS64, treating this as a
64-bit integer happens to be compatible with the layout expected by other
architectures (symbol index in upper 32-bits of resulting "native" 64-bit
integer).  However, for little-endian MIPS64 the parsed 64-bit integer
contains the symbol index in the low 32 bits and the 4 individual byte
type fields in the upper 32-bits (but as if the upper 32-bits were
byte-swapped).

To cope, add two helper routines in gelf_getrel.c to translate between the
correct native 'r_info' value and the value obtained after the normal
byte-swap translation.  Use these routines in gelf_getrel(), gelf_getrela(),
gelf_update_rel(), and gelf_update_rela().  This fixes 'readelf -r' on
little-endian MIPS64 objects which was previously decoding incorrect
relocations as well as 'objcopy: invalid symbox index' warnings from
objcopy when extracting debug symbols from kernel modules.

Even with this fixed, objcopy was still crashing when trying to extract
debug symbols from little-endian MIPS64 modules.  The workaround in
gelf_*rel*() depends on the current ELF object having a valid ELF header
so that the 'e_machine' field can be compared against EM_MIPS.  objcopy
was parsing the relocation entries to possibly rewrite the 'r_info' fields
in the update_relocs() function before writing the initial ELF header to
the destination object file.  Move the initial write of the ELF header
earlier before copy_contents() so that update_relocs() uses the correct
symbol index values.

Note that this change should really go upstream.  The binutils readelf
source has a similar hack for MIPS64EL though I implemented this version
from scratch using the MIPS64 ABI PDF as a reference.

MFC r339083 (emaste): libelf: correct mips64el test to use ELF header

libelf maintains two views of endianness: e_byteorder, and
e_ident[EI_DATA] in the ELF header itself.  e_byteorder is not always
kept in sync, so use the ELF header endianness to test for mips64el.

MFC r339473 (emaste): libelf: also test for 64-bit ELF in _libelf_is_mips64el

Although _libelf_is_mips64el is only called in contexts where we've
already checked that e_class is ELFCLASS64 but this may change in the
future.  Add a safety belt so that we don't access an invalid e_ehdr64
union member if it does.

PR: 231790

MFC r323632 (jhb): readelf: Add missing newline

after unknown MIPS-specific dynamic entries.

MFC r327219: readelf: report byte size for DT_PREINIT_ARRAYSZ

MFC r331078 (cem): nm: Initialize allocated memory before use

In out of memory scenarios (where one of these allocations failed but
other(s) did not), nm(1) could reference the uninitialized value of these
allocations (undefined behavior).

Always initialize any successful allocations as the most expedient
resolution of the issue. However, I would encourage upstream elftoolchain
contributors to clean up the error path to just abort immediately, rather
than proceeding sloppily when one allocation fails.

MFC r337287:

wmt(4): Read 'Contact count maximum' usage value from feature report

rather than from HID descriptor to match Microsoft documentation.
Fall back to HID descriptor provided value if 'Get Report' request failed.

MFC r337288:

wmt(4): Read Microsoft's "Touch Hardware Quality Assurance" certificate blob

if present to enable some devices like WaveShare touchscreens. Unlike
Windows we discard content of the blob. We try mimic Windows driver
behaviour from the USB device point of view.

Submitted by: glebius (initial version)

MFC r337289:

wmt(4): Use internal function to calculate input report size

Usbhid's hid_report_size() calculates integral size of all reports of given
kind found in the HID descriptor rather then exact size of report with given
ID as its userland counterpart does. As all input data processed by the
driver is located within the same report, calculate required driver's buffer
size with userland version, imported in one of the previous commits.
This allows us to skip zeroing of buffer on processing of each report.

While here do some minor refactoring.

MFC r338458:

wmt(4): Fix regression introduced in r337289

r337289 has a side effect of reducing usb frame 0 buffer size down to
touch report size. That broke some devices e.g. "Raydium Touch System"
which are capable of generating non-touch frames of bigger length.
Fix it with enlarging frame 0 buffer up to internal wmt(4) buffer size.

Reported by: Roberto Fernandez Cueto <roberfern@gmail.com>
Tested by: Roberto Fernandez Cueto <roberfern@gmail.com>
Differential Revision: https://reviews.freebsd.org/D16772

MFC r340075: readelf: decode R_MIPS_HIGHER and R_MIPS_HIGHEST relocation types

Sponsored by: The FreeBSD Foundation

MFC r340076: Define NT_FREEBSD_FEATURE_CTL ELF note type

This ELF note will be used to allow binaries to opt out of, or in to,
upcoming vulnerability mitigation and other features.

Sponsored by: The FreeBSD Foundation

MFC r340171: capability.h: add comment about planned removal timeline

PR: 228878

MFC r325771, r325777, r325778 (all by jhb):

Only clear a pending thread event if one is pending.
This fixes a panic when attaching to an already-stopped process.

Also do some other clean ups for control flow of sendsig section.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation

MFC r340089:
Use correct type for IOCTL request argument.
This fixes signed IOCTL value warnings in uhsoctl().

Submitted by: Marcin Cieslak <saper@saper.info>
Sponsored by: Mellanox Technologies

MFC r340100:
  Do not use bzero() for the O_ICMP6TYPE opcode.

  The buffer is already zeroed in compile_rule() function, and also it
  may contain configured F_NOT flag in o.len field. This fixes the
  filling for "not icmp6types" opcode.

MFC r340175:
  Do not print "ip6" keyword in print_icmp6types() for O_ICMP6TYPE opcode.

  It produces incompatibility when rules listing is used again to
  restore saved ruleset, because "ip6" keyword produces separate opcode.
  The kernel already has the check and only IPv6 packets will be checked
  for matching.

PR: 232939

MFC 340164,340168,340170: Add custom cpu_lock_delay() for x86.

340164:
Add a KPI for the delay while spinning on a spin lock.

Replace a call to DELAY(1) with a new cpu_lock_delay() KPI.  Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms.  However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.

340168:
Add a delay_tsc() static function for when DELAY() uses the TSC.

This uses slightly simpler logic than the existing code by using the
full 64-bit counter and thus not having to worry about counter
overflow.

340170:
Add a custom implementation of cpu_lock_delay() for x86.

Avoid using DELAY() since it can try to use spin locks on CPUs without
a P-state invariant TSC.  For cpu_lock_delay(), always use the TSC if
it exists (even if it is not P-state invariant) to delay for a
microsecond.  If the TSC does not exist, read from I/O port 0x84 to
delay instead.

PR: 228768

MFC r340260 (emaste):
Avoid buffer underwrite in icmp_error

icmp_error allocates either an mbuf (with pkthdr) or a cluster depending
on the size of data to be quoted in the ICMP reply, but the calculation
failed to account for the additional padding that m_align may apply.

Include the ip header in the size passed to m_align. On 64-bit archs
this will have the net effect of moving everything 4 bytes later in the
mbuf or cluster. This will result in slightly pessimal alignment for
the ICMP data copy.

Also add an assertion that we do not move m_data before the beginning of
the mbuf or cluster.

Approved by: re (kib, insta-MFC)
Security: CVE-2018-17156
Sponsored by: The FreeBSD Foundation

MFC r340181, r340185:

On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers.  The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver.  Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.

Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration.  These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.

Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again.  That is a much bigger effort
however and this commit is meant to be MFCable.

This enables linux64 support in x11/nvidia-driver*.

PR: 206711
Reviewed by: kib

MFC r335844:

  core(5): overwrite the oldest core dump

  The '%I' format in the kern.corefile sysctl limits the number of
  core files that a process can generate to the number stored in the
  debug.ncores sysctl. The '%I' format is replaced by the single digit
  index. Previously, if all indexes were taken the kernel would overwrite
  only a core file with the highest index in a filename.
  Currently the system will create a new core file if there is a free
  index or if all slots are taken it will overwrite the oldest one.

  Reviewed by:  kib(code), bcr (updating)
  Differential Revision:        https://reviews.freebsd.org/D15991
  Differential Revision:        https://reviews.freebsd.org/D16084

MFC r339896:
Initialize ifunc calling machinery earlier.

MFC r339892:
Clarify explanation of VFCF_SBDRY.

MFC r340137: rtld: move relro enforcement after ifunc processing

Previously the combination of relro (implicit), -z now and ifunc use
resulted in a segfault when applying ifuncs after relro (test binary
here just calls amd64_get_fsbase()):

| % env LD_DEBUG=1 libexec/rtld-elf/obj/ld-elf.so.1 a.out
| ...
| enforcing main obj relro
| ...
| resolving ifuncs
| reloc_jmpslot: *0x203198 = 0x189368ea4570
| zsh: bus error (core dumped) LD_DEBUG=1 obj/ld-elf.so.1 ~/a.out

MFC r339595: nfsrvd_readdirplus: for some errors, do not fail the entire request

Sponsored by: Panzura

MFC r339591: ichwd: add support for TCO watchdog timer in Lewisburg PCH (C620)

PR: 222079
Relnotes: maybe
Sponsored by: Panzura

MFC r306024: mrsas: update for sys/capability.h rename

Also followup fix in r312672 by jkim.

MFC r306023: auditdistd: update for sys/capability.h rename

Reported by: dhw

MFC r312758: Add sys/capability.h deprecation warning

In r263232 sys/capability.h was renamed to sys/capsicum.h, to avoid
conflicts with a capability.h header found on other operating systems.

Reported by: antoine
Sponsored by: The FreeBSD Foundation

e1000: Don't use 9k jumbo clusters

Backported to 11-STABLE from 12-CURRENT.
Avoids the issue with 9k jumbo cluster fragmentation
by maxing out at page size jumbo clusters for RX mbufs.

Submitted by: Ryan Moeller
Reviewed by: erj@
Differential Revision: https://reviews.freebsd.org/D16534

Backport of r338074 - generalize uart_bus_probe and add SNPS support to x86

Submitted by: Rajesh Kumar
Differential Revision: https://reviews.freebsd.org/D17381

MFC r337904:

  Allow the use of TCP instead of UDP for queries by setting options usevc
  in resolv.conf which sets RES_USEVC.

  Reviewed by: ume

MFC r330795:

  The vmresult table was missing most of the values apart from two due to
  extra "_" in the names we grep for. Add the "_" to the pattern.

  Reviewed by: jhb
  Sponsored by: iXsystems, Inc.

MFC r339931,r339933

  As a follow-up to r339930 and various reports implement logging in case
  we fail during module load because the pcpu or vnet module sections are
  full.  We did return a proper error but not leaving any indication to
  the user as to what the actual problem was.

PR: 228854

MFC r339431:

  In r78161 the lookup_set linker method was introduced which optionally
  returns the section start and stop locations as well as a count if the
  caller asks for them.
  There was only one out-of-file consumer of count which did not actually
  use it and hence was eliminated in r339407.
  In r194784 parse_dpcpu(), and in r195699 parse_vnet() (a copy of the
  former) started to use the link_elf_lookup_set() interface internally
  also asking for the count.

  count is computed as the difference of the void **stop - void **start
  locations and as such, if the absoulte numbers
   (stop - start) % sizeof(void *) != 0
  a round-down happens, e.g., **stop 0x1003 - **start 0x1000 => count 0.

  To get the section size instead of "count is the number of pointer
  elements in the section", the parse_*() functions do a
   count *= sizeof(void *).
  They use the result to allocate memory and copy the section data
  into the "master" and per-instance memory regions with a size of
  count.

  As a result of count possibly round-down this can miss the last
  bytes of the section.  The good news is that we do not touch
  out of bounds memory during these operations (we may at a later stage
  if the last bytes would overflow the master sections).
  Given relocation in elf_relocaddr() works based on the absolute
  numbers of start and stop, this means that we can possibly try to
  access relocated data which was never copied and hence we get
  random garbage or at best zeroed memory.

  Stop the two (last) consumers of count (the parse_*() functions)
  from using count as well, and calculate the section size based on
  the absolute numbers of stop and start and use the proper size for
  the memory allocation and data copies.  This will make the symbols
  in the last bytes of the pcpu or vnet sections be presented as
  expected.

PR: 232289

MFC r339407:

  The countp argument passed to linker_file_lookup_set() in
  linker_load_dependencies() is unused, so no need to ask for the
  value in first place.  Remove the unused "count" variable.

MFC r339930:

  With more excessive use of modules, more kernel parts working with
  VIMAGE, and feature richness and global state increasing the 8k of
  vnet module space are no longer sufficient for people and loading
  multiple modules, e.g., pf(4) and ipl(4) or ipsec(4) will fail on
  the second module.

  Increase the module space to 8 * PAGE_SIZE which should be enough
  to hold multiple firewalls, ipsec, multicast (as in the old days was
  a problem), epair, carp, and any kind of other vnet enabled modules.

  Sadly this is a global byte array part of the vnet_set, so we cannot
  dynamically change its size;  otherwise a TUNABLE would have been
  a better solution.

PR: 228854

MFC 338813: Clear all of the VFP state in fill_fpregs().

Zero the entire FP register set structure returned for ptrace() if a
thread hasn't used FP registers rather than leaking garbage in the
fp_sr and fp_cr fields.

MFC 338360,338415,338624,338630,338631,338725: Dynamic x86 IRQ layout.

338360:
Dynamically allocate IRQ ranges on x86.

Previously, x86 used static ranges of IRQ values for different types
of I/O interrupts.  Interrupt pins on I/O APICs and 8259A PICs used
IRQ values from 0 to 254.  MSI interrupts used a compile-time-defined
range starting at 256, and Xen event channels used a
compile-time-defined range after MSI.  Some recent systems have more
than 255 I/O APIC interrupt pins which resulted in those IRQ values
overflowing into the MSI range triggering an assertion failure.

Replace statically assigned ranges with dynamic ranges.  Do a single
pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to
determine the total number of IRQs required.  Allocate the interrupt
source and interrupt count arrays dynamically once this pass has
completed.  To minimize runtime complexity these arrays are only sized
once during bootup.  The PIC range is determined by the PICs present
in the system.  The MSI and Xen ranges continue to use a fixed size,
though this does make it possible to turn the MSI range size into a
tunable in the future.

As a result, various places are updated to use dynamic limits instead
of constants.  In addition, the vmstat(8) utility has been taught to
understand that some kernels may treat 'intrcnt' and 'intrnames' as
pointers rather than arrays when extracting interrupt stats from a
crashdump.  This is determined by the presence (vs absence) of a
global 'nintrcnt' symbol.

This change reverts r189404 which worked around a buggy BIOS which
enumerated an I/O APIC twice (using the same memory mapped address for
both entries but using an IRQ base of 256 for one entry and a valid
IRQ base for the second entry).  Making the "base" of MSI IRQ values
dynamic avoids the panic that r189404 worked around, and there may now
be valid I/O APICs with an IRQ base above 256 which this workaround
would incorrectly skip.

If in the future the issue reported in PR 130483 reoccurs, we will
have to add a pass over the I/O APIC entries in the MADT to detect
duplicates using the memory mapped address and use some strategy to
choose the "correct" one.

While here, reserve room in intrcnts for the Hyper-V counters.

338415:
Fix build of x86 UP kernels after dynamic IRQ changes in r338360.

338624:
msi: remove the check that interrupt sources have been added

When running as a specific type of Xen guest the hypervisor won't
provide any emulated IO-APICs or legacy PICs at all, thus hitting the
following assert in the MSI code:

panic: Assertion num_io_irqs > 0 failed at /usr/src/sys/x86/x86/msi.c:334
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff826ffa70
vpanic() at vpanic+0x1a3/frame 0xffffffff826ffad0
panic() at panic+0x43/frame 0xffffffff826ffb30
msi_init() at msi_init+0xed/frame 0xffffffff826ffb40
apic_setup_io() at apic_setup_io+0x72/frame 0xffffffff826ffb50
mi_startup() at mi_startup+0x118/frame 0xffffffff826ffb70
start_kernel() at start_kernel+0x10

Fix this by removing the assert in the MSI code, since it's possible
to get to the MSI initialization without having registered any other
interrupt sources.

338630:
lapic: skip setting intrcnt if lapic is not present

Instead of panicking. Legacy PVH mode doesn't provide a lapic, and
since native_lapic_intrcnt is called unconditionally this would cause
the assert to trigger. Change the assert into a continue in order to
take into account the possibility of systems without a lapic.

338631:
xen: legacy PVH fixes for the new interrupt count

Register interrupts using the PIC pic_register_sources method instead
of doing it in apic_setup_io. This is now required, since the internal
interrupt structures are not yet setup when calling apic_setup_io.

338725:
Fix a regression in r338360 when booting an x86 machine without APIC.

The atpic_register_sources callback tries to avoid registering interrupt
sources that would collide with an I/O APIC.  However, the previous
implementation was failing to register IRQs 8-15 since the slave PIC
saw valid IRQs from the master and assumed an I/O APIC was present.  To
fix, go back to registering all 8259A interrupt sources in one loop when
the master's register_sources method is invoked.

PR:             229429, 130483, 231291

MFC r339924:
Implement the dump_stack() function in the LinuxKPI.

Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies

MFC r339923:
Implement __KERNEL_DIV_ROUND_UP() function macro in the LinuxKPI.

Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies

MFC r339868:
Implement dma_pool_zalloc() in the LinuxKPI.

Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies

MFhead r339643:

Fix ipw_start(), where logic was reverted in r287197.

PR: 232554

MFC r313557 (by bz):
Allow Dtrace to be compiled into the kernel again after r313177.

PR: 232825

MFC r339586:

  In bhyve's fbuf emulation improve the overall "usage" message and
  for the vga option, rather than printing the entire option string,
  only print vga (as we do for everything else).

MFC r339681:

  Allow the bhyve VNC server to listen on IPv6 for incoming connections.

  Alternatively to IPv4 address:port this will allow to listen on IPv6
  link-local (incl. scope), a specific address, or ::.  Addresses have
  to be given in RFC2732 format so that [::]:port parsing will work.

  This patch also starts to introduce WITH_INET/INET6_SUPPORT to bhyve.

PR: 232018
  Submitted by: Dave Rush (northwoodlogic.free gmail.com) (original)
  Reviewed by: Dave Rush (updated verison)

MFC r339848: Import tzdata 2018g

MFC 338408: Don't directly dereference a user pointer in the VPD ioctl.

The PCIOCLISTVPD ioctl on /dev/pci is used to fetch a list of VPD
key-value pairs for a specific PCI function.  It is used by
'pciconf -l -V'.  The list is stored in a userland-supplied buffer as
an array of variable-length structures where the key and data length
are stored in a fixed-size header followed by the variable-length
value as a byte array.  To facilitate walking this array in userland,
<sys/pciio.h> provides a PVE_NEXT() helper macro to return a pointer
to the next array element by reading the the length out of the current
header and using it to compute the address of the next header.

To simplify the implementation, the ioctl handler was also using
PVE_NEXT() when on the user address of the user buffer to compute the
user address of the next array element.  However, the PVE_NEXT() macro
when used with a user address was reading the value's length by
indirecting the user pointer.  The value was ready after the current
record had been copied out to the user buffer, so it appeared to work
on architectures where user addresses are directly dereferencable from
the kernel (all but powerpc and i386 after the 4:4 split).  The recent
enablement of SMAP on amd64 caught this violation however.  To fix,
add a variant of PVE_NEXT() for use in the ioctl handler that takes an
explicit value length.

MFC 338148: Remove 'imen' global variable from atpic(4).

In pre-SMPng, the global 'imen' was used to track mask state of the
hardware interrupts and was aligned to the masks used by spl*().
When the atpic code was converted to using the x86 interrupt source
abstraction, the global 'imen' was preserved by having each PIC
instance point to an individual byte in the global 'imen' to hold its
8-bit interrupt mask. The global 'imen' is no longer used for
anything however, so rather than storing pointers in 'struct atpic',
just store the individual 8-bit mask for each PIC as a char.

While here, convert the ATPIC macro to using C99 initializers.

MFC r339366
Add support for Error Recovery

Submitted by:Vaishali.Kulkarni@cavium.com

MFC r338734

Fixed isses:
State check before enqueuing transmit task in bxe_link_attn() routine.
State check before invoking bxe_nic_unload in bxe_shutdown().

Submitted by:Vaishali.Kulkarni@cavium.com

MFC 338101: Merge amd64 and i386 <machine/intr_machdep.h> headers.

MFC: 339585

    r339585:
        Do not drop UDP traffic when TXCSUM_IPV6 flag is on

        PR:             231797
        Submitted by:   whu
        Reviewed by:    dexuan
        Obtained from:  Kevin Morse
        Sponsored by:   Microsoft
        Differential Revision:  https://bugs.freebsd.org/bugzilla/attachment.cgi?id=198333&action=diff

MFC r339600:
Make sure returned value is checked and assert a valid refcount.
While at it fix a print: Unsigned types cannot be negative.

Reviewed by: kib, mjg
Differential revision: https://reviews.freebsd.org/D17616
Sponsored by: Mellanox Technologies

MFC r337528: add an option for ddb ps command to print process arguments

Sponsored by: Panzura

MFC r303648: Fix ddb "show proc" to show full arguments

PR: 200052

MFC r339587:
Added support for formula-based arbitrary baud rates, in contrast to
the current fixed values, which enables use of rates above 1 Mbps.
Improved the detection of HXD chips, and the status flag handling as
well.

Submitted by: Gabor Simon <gabor.simon75@gmail.com>
PR: 225932
Differential revision: https://reviews.freebsd.org/D16639
Sponsored by: Mellanox Technologies

MFC r339740:
Use correct format specificator to print setdscp action.

PR: 232642

Follow up on r331936. gets_s(3) will also fail in the same way that
gets(3) does. This was missed in r331936.

Reported by: emaste@

MFH (r305124): fix case where fd_lastfile is -1.

fix up more issues introduced by failing to have run TB
before r339767

fix i386 breakage caused by r339767

hwpmc: Enable hwpmc support for AMD Family 17H devices

Adds new counters and events for family 17H devices.
Adds libpmc support for family 17H devices.

Direct commit to 11 as this is supported by way of JSON
counter descriptions on 12 & HEAD.

Submitted by: Girish Nandibasappa
Differential Revision: https://reviews.freebsd.org/D17464

MFC r339618:

Define linuxkpi readq for 64-bit architectures. It is used by drm-kmod.
Currently the compiler picks up the definition in machine/cpufunc.h.

Add compiler memory barriers to read* and write*. The Linux x86
implementation of these functions uses inline asm with "memory" clobber.
The Linux x86 implementation of read_relaxed* and write_relaxed* uses the
same inline asm without "memory" clobber.

Implement ioread* and iowrite* in terms of read* and write* so they also
have memory barriers.

Qualify the addr parameter in write* as volatile.

Like Linux, define macros with the same name as the inline functions.

Only define 64-bit versions on 64-bit architectures because generally
32-bit architectures can't do atomic 64-bit loads and stores.

Regroup the functions a bit and add brief comments explaining what they do:
- __raw_read*, __raw_write*: atomic, no barriers, no byte swapping
- read_relaxed*, write_relaxed*: atomic, no barriers, little-endian
- read*, write*: atomic, with barriers, little-endian

Add a comment that says our implementation of ioread* and iowrite*
only handles MMIO and does not support port IO.

Reviewed by: hselasky

MFC r339684:
Reduce the GCE image size to 27G to be lower than the free
quota limit.

PR: 232313
Sponsored by: The FreeBSD Foundation

MFC r339582:
Drop sequencer mutex around uiomove() and make sure we don't move more bytes
than is available, else a panic might happen.

Found by: Peter Holm <peter@holm.cc>
Sponsored by: Mellanox Technologies

MFC r339581:
Fix off-by-one which can lead to panics.

Found by: Peter Holm <peter@holm.cc>
Sponsored by: Mellanox Technologies

MFC r339584 :
mlx5: Notify user that the ConnectX-6 shutdown its port due to power limitation

If power exceed the slot limit, or slot limit is unknown the ConnectX-6
firmware will shutdown its port.
Inform the user via debug message.

Approved by: hselasky (mentor), kib (mentor)
Sponsored by: Mellanox Technologies

elfcopy: avoid stripping relocations from static binaries

MFC r339350: elfcopy: delete filter_reloc, it is broken and unnecessary

elfcopy contained logic to filter individual relocations in STRIP_ALL
mode.  However, this is not valid; relocations emitted by the linker are
required, unless they apply to an entire section being removed (which is
handled by other logic in elfcopy).

Note that filter_reloc was also buggy: for RELA relocation sections it
operated on uninitialized rel.r_info resulting in invalid operation.

The logic most likely needs to be inverted: instead of removing
relocations because their associated symbols are being removed, we must
keep symbols referenced by relocations.  That said, in practice we do
not encounter this code path today: objects being stripped are either
dynamically linked binaries which retain .dynsym, or static binaries
with no relocations.

Just remove filter_reloc.  This fixes certain cases including statically
linked binaries containing ifuncs.  Stripping binaries with relocations
referencing removed symbols was already broken, and after this change
may still be broken in a different way.

MFC r339451: objcopy: restore behaviour required by GCC's build

In r339350 filter_reloc() was removed, to fix the case of stripping
statically linked binaries with relocations (which may come from ifunc
use, for example).  As a side effect this changed the behaviour when
stripping object files - the output was broken both before and after
r339350, in different ways.  Unfortunately GCC's build process relies
on the previous behaviour, so:

- Revert r339350, restoring filter_reloc().
- Fix an unitialized variable use (commited as r3638 in ELF Tool Chain).
- Change filter_reloc() to omit relocations referencing removed
  symbols, while retaining relocations with no symbol reference.
- Retain the entire relocation section if it references the dynamic
  symbol table (fix from kaiw in D17596).

PR: 232176
Sponsored by: The FreeBSD Foundation

MFC r339509: Fix loader.conf(5) "password" feature

Restore the ability to prevent the user from interrupting the boot process
without first entering the password stored in loader.conf(5).

PR: kern/207069
Reported by: david@dcrosstech.com
Sponsored by: Smule, Inc.

MFC r339547:

vlan: Fix panic with lagg and vlan

vlan_lladdr_fn() is called from taskqueue, which means there's no vnet context
set. We can end up trying to send ARP messages (through the iflladdr_event
event), which requires a vnet context.

PR: 227654