tuexen [Thu, 25 Feb 2016 19:21:46 +0000 (19:21 +0000)]
MFC r295273:
In FreeBSD 10 and higher the driver announces SCTP checksum offloading support
also for 82598, which doesn't support it.
The legacy code has a check for it, which was missed when the code for dealing with
CSUM_IP6_* was added. Add the same check for FreeBSD 10 and higher.
Approved by: re (marius)
Differential Revision: D5192
erj [Thu, 25 Feb 2016 19:15:06 +0000 (19:15 +0000)]
MFC r295323:
Update em(4) to 7.6.1; update igb(4) to 2.5.3.
Major changes:
- Add i219/i219(2) hardware support. (Found on Skylake generation and newer
chipsets.)
- Further to the last Skylake support diff, this one also includes support for
the Lewisburg chipset (i219(3)).
- Add a workaround to an igb hardware errata.
All 1G server products need to have IPv6 extension header parsing turned off.
This should be listed in the specification updates for current 1G server
products, e.g. for i350 it's errata #37 in this document:
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/ethernet-controller-i350-spec-update.pdf
- Avoton (i354) PHY errata workaround added
And a bunch of minor fixes, as well as #defines for things that the current
em(4)/igb(4) drivers don't implement.
MFC r287465:
igb(4): Update and fix HW errata
- HW errata workaround for IPv6 offload w/ extension headers
- Edited start of if_igb.c (Device IDs / #includes) to match ixgbe/ixl
Approved by: re (gjb)
Sponsored by: Intel Corporation
tuexen [Thu, 25 Feb 2016 18:46:06 +0000 (18:46 +0000)]
MFC r295549:
Loopback addresses are 127.0.0.0/8, not 127.0.0.1/32.
MFC r295668:
Improve the teardown of the SCTP stack.
MFC r295670:
Whitespace changes.
MFC r295708:
Address a warning reported by D5245 / PVS.
MFC r295709:
Code cleanup which will silence a warning in PVS / D5245.
MFC r295710:
Add protection code for issues reported by PVS / D5245.
MFC r295771:
Fix reporting of mapped addressed in getpeername() and getsockname() for
IPv6 SCTP sockets.
This bugs were found because of an issue reported by PVS / D5245.
MFC r295772:
Add some protection code.
MFC r295773:
Add protection code.
MFC r295805:
Use the SCTP level pointer, not the interface level.
MFC r295929:
Don't leak an address in an error path.
araujo [Thu, 25 Feb 2016 15:33:55 +0000 (15:33 +0000)]
MFH 295796 (based on)
Fix regression introduced on 272446r. lagg(4) supports the protocol none,
where it disables any traffic without disabling the lagg(4) interface itself.
PR: 206478
Submitted by: Erin Clark <erin.clark.ix@gmail.com>
Reviewed by: rpokala, bapt
Approved by: re (glebius)
Differential Revision: https://reviews.freebsd.org/D5188
r294933:
Drop any previous fd when setting a new one.
r294949:
filemon_ioctl: Handle error from devfs_get_cdevpriv(9).
r294952:
filemon_ioctl: Lock the associated filemon handle before writing to it.
r294953:
filemon_comment has nothing to do with wrappers so move it out of
filemon_wrapper.c.
r294957:
filemon_dtr: Lock the associated filemon handle before writing to it.
r294965:
filemon: Use process_exit EVENTHANDLER to capture process exit.
r294967:
filemon: Trace fork via process_fork event.
r294968:
Follow-up r294967: Mark flags unused.
r295017:
filemon: Use process_exec EVENTHANDLER to capture sys_execve.
r295026:
filemon_open: Don't record a process to trace here.
r295027:
filemon: Track the process pointer rather than a pid.
r295029:
Document the purpose and non-purpose of filemon(4).
r295030:
Note the double fork behavior with filemon.
r295649:
filemon: Fix panic when fork1() is called from kproc_create().
jhb [Wed, 24 Feb 2016 22:01:45 +0000 (22:01 +0000)]
MFC 295636,295637:
Fix issues with tracing Linux/i386 binaries.
295636:
Sign extend the error value for failing Linux/i386 system calls. This
restores the mapping of Linux errors to native FreeBSD errno values after
the refactoring in r288424.
295637:
Correct the ABI name for Linux/i386 binaries under FreeBSD/i386.
This allows truss to work for these binaries again after r288424.
kib [Wed, 24 Feb 2016 13:48:40 +0000 (13:48 +0000)]
MFC r295717:
After nullfs rmdir operation, reclaim the directory vnode which was
unlinked. Otherwise the vnode stays cached, causing leak. This is
similar to r292961 for regular files.
sephe [Wed, 24 Feb 2016 01:30:50 +0000 (01:30 +0000)]
MFC [Hyper-V]: r294553, r294700
r294553
hyperv/vmbus: Lookup channel through id table
Vmbus event handler will need to find the channel by its relative
id, when software interrupt for event happens. The original lookup
searches the channel list, which is not very efficient. We now
create a table indexed by the channel relative id to speed up
the channel lookup.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: delphij, adrain, sephe, Dexuan Cui <decui microsoft com>
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4802
-------------
r294700
hyperv/hn: Partly rework transmission path
- Avoid unnecessary malloc/free on transmission path.
- busdma(9)-fy transmission path.
- Properly handle IFF_DRV_OACTIVE. This should fix the network
stalls reported by many.
- Properly setup TSO parameters.
- Properly handle bpf(4) tapping. This 5 times the performance
during TCP sending test, when there is one bpf(4) attached.
- Allow size of chimney sending be tuned on a running system.
Default value still needs more test to determine.
Reviewed by: adrian, delphij
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4972
Approved by: re (marius)
Sponsored by: Microsoft OSTC
marius [Tue, 23 Feb 2016 01:09:35 +0000 (01:09 +0000)]
In preparation for 10.3-RELEASE, temporarily revert the MFC of r291244
done as part of r292895 on stable/10 as that change causes hangs with
ZFS and the cause on at least amd64 so far not understood.
Discussed with: kib
For further information see:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-February/084045.html
marius [Mon, 22 Feb 2016 00:49:35 +0000 (00:49 +0000)]
MFC: r287299 [1]
Add a gop command to help diagnose VT efifb problems. The gop
command has the following sub-commands:
list - list all possible modes (paged)
get - return the current mode
set <mode> - set the current mode to <mode>
Add support for the UGA draw protocol. This includes adding a
command called 'uga' to show whether UGA is implemented by the
firmware and what the settings are. It also includes filling
the efi_fb structure from the UGA information when GOP isn't
implemented by the firmware.
r293719 hyperv/hn: Implement LRO
r293720 hyperv/hn: Implement SIOC[SG]IFMEDIA support
r293721 hyperv/hn: Avoid mbuf cluster allocation, if the packet is small.
r293722 hyperv/hn: Removed unused netvsc_init()
r293869 hyperv/hn: Unbreak LINT-NOIP
r293870 hyperv: use x86 generic code to do the hypervisor detection
r293871 hyperv: remove unused vmbus definitions
r293873 hyperv: implement an event timer
r293874 hyperv: add interrupt counters
r293875 hyperv: set receive buffer size according to NVSP protocol version
r293877 Unbreak `make depend` with sys/modules/hyperv/vmbus after r293870
Approved by: re (glebius), adrian (mentor)
Sponsored by: Microsoft OSTC
pfg [Wed, 17 Feb 2016 19:09:06 +0000 (19:09 +0000)]
MFC r295616:
ext2fs: Remove panics for rename() race conditions.
Sync with r84642 from UFS:
The panics are inappropriate because the IN_RENAME flag only fixes a
few of the huge number of race conditions that can result in the
source path becoming invalid even prior to the VOP_RENAME() call.
jimharris [Wed, 17 Feb 2016 15:38:05 +0000 (15:38 +0000)]
MFC r295022:
nvd: add hw.nvd.delete_max tunable
The NVMe specification does not define a maximum or optimal delete
size, so technically max delete size is min(full size of namespace,
2^32 - 1 LBAs). A single delete operation for a multi-TB NVMe
namespace though may take much longer to complete than the nvme(4)
I/O timeout period. So choose a sensible default here that is still
suitably large to minimize the number of overall delete operations.
This also fixes possible uint32_t overflow on initial TRIM operation
for zpool create operations for NVMe namespaces with >4G LBAs.
jimharris [Wed, 17 Feb 2016 15:36:02 +0000 (15:36 +0000)]
MFC r295532:
nvme: avoid duplicate SET_NUM_QUEUES commands
nvme(4) issues a SET_NUM_QUEUES command during device
initialization to ensure enough I/O queues exists for each
of the MSI-X vectors we have allocated. The SET_NUM_QUEUES
command is then issued again during nvme_ctrlr_start(), to
ensure that is properly set after any controller reset.
At least one NVMe drive exists which fails this second
SET_NUM_QUEUES command during device initialization. So
change nvme_ctrlr_start() to only issue its SET_NUM_QUEUES
command when it is coming out of a reset - avoiding the
duplicate SET_NUM_QUEUES during device initialization.
jhb [Tue, 16 Feb 2016 21:36:48 +0000 (21:36 +0000)]
MFC 295418,295419:
Fix hangs or panics when misbehaved kernel threads return from their
main function.
295418:
Mark proc0 as a kernel process via the P_KTHREAD flag.
All other kernel processes have this flag set and all threads in proc0
(including thread0) have the similar TDP_KTHREAD flag set.
295419:
Call kthread_exit() rather than kproc_exit() for a premature kthread exit.
Kernel threads (and processes) are supposed to call kthread_exit() (or
kproc_exit()) to terminate. However, the kernel includes a fallback in
fork_exit() to force a kthread exit if a kernel thread's "main" routine
returns. This fallback was added back when the kernel only had processes
and was not updated to call kthread_exit() instead of kproc_exit() when
threads were added to the kernel.
This mistake was particularly exciting when the errant thread belonged to
proc0. Due to the missing P_KTHREAD flag the fallback did not kick in
and instead tried to return to userland via whatever garbage was in the
trapframe. With P_KTHREAD set it tried to terminate proc0 resulting in
other amusements.
dumbbell [Mon, 15 Feb 2016 07:35:40 +0000 (07:35 +0000)]
drm/i915: Restore pci_enable_busmaster() call in the init path
This fixes a GPU hang on i945GM.
While here, merge some minor fixes to DRM core and i915:
* Remove obsolete drm_agp_*_memory() prototypes
* Fix comment in drm_fops.c (outisde -> outside)
* Fix some formatting issues in drm_stub.c (spaces -> tabs)
kib [Sun, 14 Feb 2016 17:21:19 +0000 (17:21 +0000)]
MFC r294595:
When devfs dirent is freed, a vnode might still keep a pointer to it,
apparently. Interlock and clear the pointer to avoid free memory
dereference.
imp [Thu, 11 Feb 2016 23:43:27 +0000 (23:43 +0000)]
Merge from current r294767,294769,295408
r294767: Parse command line arguments in loader.fi
r294769: Allow newlines to be treated as whitespace when parsing args
r295408: Implement -P command line option in for EFI booting.
smh [Thu, 11 Feb 2016 17:56:09 +0000 (17:56 +0000)]
Fix ia64 build failures in EFI platform
The MFC of the recent EFI work to stable/10 caused build breakage
under ia64.
It was not apparent that there was EFI code outside the EFI tree as
this is not the case in HEAD, however in stable/10 there is for ia64.
This change does the following:
* Re-enables libefi for ia64 under gcc.
* Adds the ignore for unsupported pragma's when building libefi for ia64.
* Adds the missing parameter to efi_handle_lookup in the ia64 loader.
This is a direct commit as ia64 is no longer supported after 10.x
arybchik [Thu, 11 Feb 2016 16:39:30 +0000 (16:39 +0000)]
MFC r295467
sfxge: implement SIOCGI2C to read information from phy modules
The IOCTL is used by 'ifconfig -v' to show SFP+/QSFP+ information
including inventory information and dianostics (temperature, light
levels, voltage etc).
Reviewed by: gnn,melifaro
Approved by: re (gjb)
Sponsored by: Solarflare Communications, Inc.
jhb [Wed, 10 Feb 2016 18:29:37 +0000 (18:29 +0000)]
Adjust initialization of random(9) so it is usable earlier.
A few existing SYSINITs expect the in-kernel PRNG (random(9)) to be
useable at SI_SUB_RANDOM / SI_ORDER_ANY. However, the random(4) overhaul
merged for 10.0 performs all of its initialization at SI_SUB_DRIVERS
(since it is tied in with creating the /dev/random character device).
This has changed in HEAD where the random initialization is split such
that the in-kernel random(9) is initialized at SI_SUB_RANDOM and the
supporting bits for userland random(4) (such as /dev/random) are initialized
later.
However, the changes in HEAD are large and invasive. Instead, this change
is being directly committed to stable/10.
This change moves most of the random(9)/random(4) initialization to
SI_SUB_RANDOM with the exception that the creation of the harvesting kernel
process and the /dev/random character device are deferred to new
SYSINITs that run at SI_SUB_DRIVERS.
This fixes the "random device not loaded; using insecure entropy" message
output during boot on some systems.
PR: 205800
Reviewed by: markm, so@
Approved by: so
Approved by: re (gjb)
Tested by: Mark Saad <nonesuch@longcount.org>
cy [Wed, 10 Feb 2016 07:16:17 +0000 (07:16 +0000)]
MFC r289421, r293037, r294773, and r294884.
ntp leap-leapseconds support.
r289421:
Add default leap-seconds file. This should help ntp networks get the
leap second date correct
Updates to the file can be obtained from ftp://time.nist.gov/pub/ or
ftp://tycho.usno.navy.mil/pub/ntp/.
r293037:
Update leap-seconds to latest. This will satisfy the ntpd leap-second
version check.
r294773:
Add support for automatic leap-second file updates.
The working copy of leapfile resides in /var/dbntpd.leap-seconds.list.
/etc/ntp/leap-seconds (periodically updated from ftp://time.nist.gov/pub/
or ftp://tycho.usno.navy.mil/pub/ntp/) contains the master copy should
automatic leapfile updates be disabled (default).
Automatic leapfile updates are fetched from $ntp_leapfile_sources,
defaulting to https://www.ietf.org/timezones/data/leap-seconds.list,
within $ntp_leapfile_expiry_days (default 30 days) from leap-seconds
file expiry. Automatic updates can be enabled by setting
$daily_ntpd_leapfile_enable="YES" in periodic.conf. To avoid congesting
the ntp leapfile source the automatic update randomized by default but
can be disabled through daily_ntpd_avoid_congestion="NO" in
periodic.conf.
r294884:
Allow specification of fetch options for ntp leap-seconds fetch.
jhb [Wed, 10 Feb 2016 00:08:51 +0000 (00:08 +0000)]
MFC 287442,287537,288944:
Fix corruption of coredumps due to procstat notes changing size during
coredump generation. The changes in r287442 required some reworking
since the 'fo_fill_kinfo' file op does not exist in stable/10.
Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.
When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.
NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath. As vnodes may move around during dump, this is racy.
So:
- Detect badly behaved notes in putnote() and pad underfilled notes.
- Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
exercise the NT_PROCSTAT_FILES corruption. It simply picks random
lengths to expand or truncate paths to in fo_fill_kinfo_vnode().
- Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
disable kinfo packing for PROCSTAT_FILES notes. This should avoid
both FILES note corruption and truncation, even if filenames change,
at the cost of about 1 kiB in padding bloat per open fd. Document
the new sysctl in core.5.
- Fix note_procstat_files to self-limit in the 2nd pass. Since
sometimes this will result in a short write, pad up to our advertised
size. This addresses note corruption, at the risk of sometimes
truncating the last several fd info entries.
- Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
zero padding.
287537:
Follow-up to r287442: Move sysctl to compiled-once file
Avoid duplicate sysctl nodes.
288944:
Fix core corruption caused by race in note_procstat_vmmap
This fix is spiritually similar to r287442 and was discovered thanks to
the KASSERT added in that revision.
NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' vm map
via vn_fullpath. As vnodes may move during coredump, this is racy.
We do not remove the race, only prevent it from causing coredump
corruption.
- Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable
kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption
and truncation, even if names change, at the cost of up to PATH_MAX
bytes per mapped object. The new sysctl is documented in core.5.
- Fix note_procstat_vmmap to self-limit in the second pass. This
addresses corruption, at the cost of sometimes producing a truncated
result.
- Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste)
to grok the new zero padding.
emaste [Tue, 9 Feb 2016 22:32:24 +0000 (22:32 +0000)]
MFC boot loader path and RBX constant deduplication
r294765 (imp)
Move all the separate copies of the same strings into paths.h. There's
nothing machine specific about these.
r294765 (imp)
RBX_ defines are in rbx.h, move it there.
r294847 (imp)
Remove static from these two. They slipped through the cracks.
r294925 (imp)
Fix mistake when transitioning to the new defines with ZFS loader. I
hate adding yet another define, but it is the lessor of the evil
choices available. Kill another evil by removing PATH_BOOT3 and
replacing it with PATH_LOADER or PATH_LOADER_ZFS as appropriate.
Since r256624 (head) we have been leaking routing table allocations
on vnet enabled jail shutdown. Call the provided cleanup
routines for IP versions 4 and 6 to plug these leaks.
Sponsored by: The FreeBSD Foundation
Reviewed by: gnn
Differential Revision:https://reviews.freebsd.org/D4530
Try to fix a bug introduced in r228623 (head).
We started to copy the ifa_msghdr as otherwise platforms with strict
alignment would break. It is unclear to me if there's also a problem with
access to the address list following the structure.
However we never copied the address list after the structure and thus are
pointing at random memory. For now just use a pointer to the original
memory for accessing the address list making it at least work on
platforms with weak memory access.
PR: 195445
Reported by: wolfgang lyxys.ka.sub.org
Tested by: wolfgang lyxys.ka.sub.org (x86)
pfg [Sat, 6 Feb 2016 16:58:56 +0000 (16:58 +0000)]
MFC r295209;
Revert r294695; passthrough any extra timestamps to the dinode struct.
The original ext2fs change worked fine on disks formated with default
values, but it was the cause of a regression when inodes are small.
Revert it for now, while we figure out safer ways pass such values,
marius [Thu, 4 Feb 2016 23:56:01 +0000 (23:56 +0000)]
MFC: r295133
As it turns out, one of the more or less recent changes to em(4)
causes watchdog timeouts when using TSO4 at link speeds below
Gigabit, at least with 82573E. So disable the assist automatically
when at lower speeds.
gnn [Thu, 4 Feb 2016 22:53:12 +0000 (22:53 +0000)]
MFC: r290383,295282,295283
Replace the fastforward path with tryforward which does not require a
sysctl and will always be on. The former split between default and
fast forwarding is removed by this commit while preserving the ability
to use all network stack features.
tuexen [Wed, 3 Feb 2016 14:04:07 +0000 (14:04 +0000)]
MFC r294995:
Always look in the TCP pool.
This fixes issues with a restarting peer when the listening
1-to-1 style socket is closed.
MFC r295021:
Remove debug output which was committed by accident.
Thanks to Oliver Pinter for reporting.
MFC r295069:
Ignore peer addresses in a consistent way also when checking for
new addresses during restart. If this is not done, restart doesn't
work when the local socket is IPv4 only and the peer uses
IPv4 and IPv6 addresses.
MFC r295070:
Don't change the remote UDP encapsulation port for SCTP packets
containing an INIT chunk.
MFC r295072:
Don't allow a remote encapsulation port change during the
SCTP restart procedure.
MFC r295075:
Update the path mtu when turning on/off UDP encapsulation for SCTP.
MFC r295077:
Add missing parentheses. This was reported by ccaughie via GitHub
for the userland stack.
kib [Tue, 2 Feb 2016 14:16:07 +0000 (14:16 +0000)]
MFC r294311:
Clear whole XMM register file instead of only XMM0. Also clear x87
registers. This brings amd64 on par with i386, providing consistent
initial FPU state.
PR: 206370
MFC r294312:
Use ANSI definitions. Wrap long line.
MFC r294313:
Adjust i386 comment to match amd64 one after r294311.
jhb [Mon, 1 Feb 2016 23:07:31 +0000 (23:07 +0000)]
MFC 278320,278336,278830,285621:
Add devctl(8): a utility for manipulating new-bus devices. Note that
this version does not include the 'suspend' and 'resume' commands
present in HEAD as those depend on larger changes to the suspend and
resume code in the kernel.
278320:
Add a new device control utility for new-bus devices called devctl. This
allows the user to request administrative changes to individual devices
such as attach or detaching drivers or disabling and re-enabling devices.
- Add a new /dev/devctl2 character device which uses ioctls for device
requests. The ioctls use a common 'struct devreq' which is somewhat
similar to 'struct ifreq'.
- The ioctls identify the device to operate on via a string. This
string can either by the device's name, or it can be a bus-specific
address. (For unattached devices, a bus address is the only way to
locate a device.) Bus drivers register an eventhandler to claim
unrecognized device names that the driver recognizes as a valid address.
Two buses currently support addresses: ACPI recognizes any device
in the ACPI namespace via its full path starting with "\" and
the PCI bus driver recognizes an address specification of
'pci[<domain>:]<bus>:<slot>:<func>' (identical to the PCI selector
strings supported by pciconf).
- To make it easier to cut and paste, change the PnP location string
in the PCI bus driver to output a full PCI selector string rather
than 'slot=<slot> function=<func>'.
- Add a devctl(3) interface in libdevctl which provides a wrapper around
the ioctls and is the preferred interface for other userland code.
- Add a devctl(8) program which is a simple wrapper around the requests
supported by devctl(3).
- Add a resource_unset_value() function that can be used to remove a
hint from the kernel environment. This is used to clear a
hint.<driver>.<unit>.disabled hint when re-enabling a boot-time
disabled device.
278336:
Unbreak the build (memchr is explicitly required by devctl(9) after r278320)
marius [Mon, 1 Feb 2016 22:16:41 +0000 (22:16 +0000)]
MFC: r295032
Use '^[>+][^+]' instead of '^[>+]' with grep(1) when filtering the
diff(1) output between two files in "new_only"-mode. Otherwise,
with the default of using unified format a remnant of the header
in the output is the result. This is especially irritating when
the two files differ but the second one is empty, amounting to the
vestige of the header being the only readout.
Reported by: Stefan Haemmerl
dteske [Mon, 1 Feb 2016 00:44:29 +0000 (00:44 +0000)]
MFC revisions 294860,294862,294892-294893,294922
r294860: Add keep_tite configuration option
r294862: Bump copyrights
r294892: Remove unused function prototype
r294893: Fix a crash if `-D' is used without `-t title'
r294922: Fix fatal warn when compiling under GCC 5.2.0
brooks [Thu, 28 Jan 2016 22:57:09 +0000 (22:57 +0000)]
MFC r294515:
Fix the implementations of PSEUDO_NOERROR and PSEUDO.
The PSEUDO* macros should not declare <syscall>, only _<syscall> and
__sys_<syscall>. This was causing the interposing C wrappers to be
ignored due to link order.
hiren [Thu, 28 Jan 2016 21:30:49 +0000 (21:30 +0000)]
MFC r294840
Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds and
60 seconds, respectively. Turn them into sysctls that can be tuned live. The
default values of 5 seconds and 60 seconds have been retained.