CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

Alan Somers [Mon, 12 Jun 2017 14:54:59 +0000 (14:54 +0000)]

bin/ln: Set umask appropriately before creating files for testing

These changes were missed in D11084

Submitted by: shivansh
Reviewed by: asomers
MFC after: 1 month
X-MFC-With: 319714
Sponsored by: Google, Inc (GSoC 2017)
Differential Revision: https://reviews.freebsd.org/D11158

commit | commitdiff | tree

Ed Maste [Mon, 12 Jun 2017 13:49:57 +0000 (13:49 +0000)]

makefs: use C standard memcpy/memset in userland

This file does not exist in NetBSD's makefs, but make the chance for
consistency with memcpy/memset used in the rest of makefs.

Sponsored by: The FreeBSD Foundation

commit | commitdiff | tree

Xin LI [Mon, 12 Jun 2017 09:11:31 +0000 (09:11 +0000)]

Fix buffer lengths.

After r319369, the RPC code validates caller supplied buffer length in
taddr2uaddr. When no -h is specified, the sizeof(ai_addr) is used,
which is always smaller than the required size and therefore uaddr
would be NULL, causing the kernel to copyin() from userland NULL
and fail with EFAULT.

Reviewed by: kevlo (via Telegram)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D11151

commit | commitdiff | tree

Dmitry Chagin [Mon, 12 Jun 2017 07:48:51 +0000 (07:48 +0000)]

Remove the outdated definition.

MFC after: 1 week

commit | commitdiff | tree

Enji Cooper [Mon, 12 Jun 2017 07:43:58 +0000 (07:43 +0000)]

Add some initial basic tests for du(1)

Tests that exercise the following flags are added in this commit:
- -A
- -H
- -I
- -g
- -h
- -k
- -m

Additional tests will be added soon.

MFC after: 1 month

commit | commitdiff | tree

Dmitry Chagin [Mon, 12 Jun 2017 07:35:59 +0000 (07:35 +0000)]

Since r318735 (ino64 project) the size of the native struct dirent is
equal or greater than the size of Linux struct dirent or struct dirent64.
So, remove LINUX_RECLEN_RATIO magic as useless.

commit | commitdiff | tree

Cy Schubert [Mon, 12 Jun 2017 06:08:57 +0000 (06:08 +0000)]

-v (verbose) is not a command option. (See ippool.1 for a definition
of command options).

commit | commitdiff | tree

Enji Cooper [Mon, 12 Jun 2017 05:11:43 +0000 (05:11 +0000)]

Add some testcases for `diff --side-by-side` support

These are were created proactively, in anticipation of the support being
fully implemented sometime in the future.

The tests currently fail on ^/head@r319845, however. Expect them to fail.

PR: 219933
Tested with: gdiff

commit | commitdiff | tree

Enji Cooper [Mon, 12 Jun 2017 02:42:39 +0000 (02:42 +0000)]

du(1): trivial whitespace cleanup

MFC after: 1 month

commit | commitdiff | tree

Enji Cooper [Mon, 12 Jun 2017 02:38:37 +0000 (02:38 +0000)]

Remove stdlib.h #include added in r319844

A previous iteration of the tests I added in r319844 involved free(3), but
that attempt didn't pan out, so I switched to stack allocated buffers instead
of heap allocated ones, making the #include unnecessary.

MFC after: 1 month
MFC with: r319844

commit | commitdiff | tree

Enji Cooper [Mon, 12 Jun 2017 02:12:22 +0000 (02:12 +0000)]

Add positive and negative testcases for cam_get_device(3)

MFC after: 1 month
Submitted by: Evan Cramer <evan.cramer@isilon.com>

commit | commitdiff | tree

Gregory Neil Shapiro [Mon, 12 Jun 2017 01:26:36 +0000 (01:26 +0000)]

Fix 'restart' action: rc.subr only expects to restart one service, not two.

PR: 217393
Reported by: Martin Simmons
MFC after: 1 week

commit | commitdiff | tree

Enji Cooper [Mon, 12 Jun 2017 00:43:14 +0000 (00:43 +0000)]

getbsize(3): clarify that underflow/overflow warnings in regard to $BLOCKSIZE
gets output via warnx(3)

This helps set expectations for how one might deal with those messages, i.e.,
mute output from /dev/stderr today, since that's where vwarn(3) outputs messages
to today.

MFC after: 1 month

commit | commitdiff | tree

Enji Cooper [Mon, 12 Jun 2017 00:21:55 +0000 (00:21 +0000)]

Add initial tests for stat(1)

Testcases for -H, -L, and -f haven't been implemented yet, in part due
to additional complexity needed to validate the features:
* -H and -f will require an external "helper" program to display/modify
the state/permissions for a given path.
* -L is being covered partially via the -n testcase today.

MFC after: 1 month

commit | commitdiff | tree

Enji Cooper [Sun, 11 Jun 2017 21:23:54 +0000 (21:23 +0000)]

stat(1): sort flags in the DESCRIPTION section

-x's description should come after -t's description.

MFC after: 1 month
Sponsored by: Dell EMC Isilon

commit | commitdiff | tree

Enji Cooper [Sun, 11 Jun 2017 21:13:12 +0000 (21:13 +0000)]

Write up some basic tests for readlink(1)

The tests exercise -f (f_flag), -n (n_flag), and no arguments (basic).

MFC after: 1 month
Sponsored by: Dell EMC Isilon

commit | commitdiff | tree

Enji Cooper [Sun, 11 Jun 2017 19:31:42 +0000 (19:31 +0000)]

Add more simple positive tests for chown(1)

The tests are largely symmetric with the tests for chmod(1)--added in r319642.

Remove chown-f_test (added in r268030) since the test coverage is now being
provided by `chown_test`.

MFC after: 1 month
Sponsored by: Dell EMC Isilon

commit | commitdiff | tree

Pedro F. Giffuni [Sun, 11 Jun 2017 19:09:10 +0000 (19:09 +0000)]

Remove unnecessary, and mismatched, comment.

Submitted by: Fedor Uporov

commit | commitdiff | tree

Jilles Tjoelker [Sun, 11 Jun 2017 19:06:07 +0000 (19:06 +0000)]

rc.subr: Optimize repeated sourcing.

When /etc/rc runs all /etc/rc.d scripts, it has already loaded /etc/rc.subr
but each /etc/rc.d script sources it again (since /etc/rc.d scripts must
also work when started stand-alone).

Therefore, if rc.subr is already loaded, return so sh need not parse the
rest of the file.

A second effect is that there is no longer a compound command around most of
rc.subr. This reduces memory usage while sh is loading rc.subr for the first
time (but this memory is free()d once rc.subr is loaded).

For purposes of porting this to other systems, I do not recommend porting
this to systems with shells that do not have the change to the return
special builtin like in r255215 (before FreeBSD 10.0-RELEASE). This change
ensures that return in the top level of a dot script returns from the dot
script, even if the dot script was sourced from a function.

A comparison of CPU time on an amd64 bhyve virtual machine from a times
command added near the end of /etc/rc, all four values summed:

x orig1
+ quickreturn
+--------------------------------------------------------------------------+
|  +    +              +                             x    x               x|
||______M__A_________|                             |______M___A__________| |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3         1.704         1.802         1.726         1.744   0.051419841
+   3         1.467         1.559         1.487     1.5043333   0.048387326
Difference at 95.0% confidence
-0.239667 +/- 0.113163
-13.7424% +/- 6.48873%
(Student's t, pooled s = 0.0499266)

commit | commitdiff | tree

Pedro F. Giffuni [Sun, 11 Jun 2017 19:05:45 +0000 (19:05 +0000)]

extfs: fix the build with no UFS_ACL.

Some people may want to drop UFS-style ACLs for slimmer kernels.
Let's just not assume everyone needs ACLs.

Reported by: bde
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11145

commit | commitdiff | tree

Jilles Tjoelker [Sun, 11 Jun 2017 16:54:04 +0000 (16:54 +0000)]

sh: Enable interrupts before executing EXIT trap and doing final flush.

commit | commitdiff | tree

Konstantin Belousov [Sun, 11 Jun 2017 14:39:08 +0000 (14:39 +0000)]

More accurately handle early EFER restoration on resume.

Do not try to set LMA bit while CPU is still in legacy mode.
Apparently Intel CPUs ignore non-id writes to LMA, while AMD's
(over-)react with #GP.

Reported and tested by: danfe
Sponsored by: The FreeBSD Foundation
MFC after: 3 days

commit | commitdiff | tree

Sevan Janiyan [Sun, 11 Jun 2017 14:33:16 +0000 (14:33 +0000)]

Tidy up minor nits raised by mandoc lint:
Zap trailing white and double spaces
Remove extra coma which is not required.
Bump date.

Reviewed by: gnn
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11142

commit | commitdiff | tree

Cy Schubert [Sun, 11 Jun 2017 04:03:09 +0000 (04:03 +0000)]

Flag loadpoolfile() (ippool -f) command line syntax errors.

commit | commitdiff | tree

Cy Schubert [Sun, 11 Jun 2017 04:00:26 +0000 (04:00 +0000)]

Identify poolstats() (ippool -s) command line syntax errors.

commit | commitdiff | tree

Cy Schubert [Sun, 11 Jun 2017 03:56:13 +0000 (03:56 +0000)]

Identify command line syntax errors in poolflush() (ippool -F).

commit | commitdiff | tree

Ian Lepore [Sun, 11 Jun 2017 00:44:19 +0000 (00:44 +0000)]

Convert from local code and constants for mac<->phy connection type to new
common fdt helper code.

commit | commitdiff | tree

Ian Lepore [Sun, 11 Jun 2017 00:38:16 +0000 (00:38 +0000)]

Add a driver for the Vitesse/Microsemi VSC8501 PHY.

commit | commitdiff | tree

Ian Lepore [Sun, 11 Jun 2017 00:16:21 +0000 (00:16 +0000)]

Add some utility functions to help a PHY driver on an FDT-configured
system retrieve its config data from the fdt data.

The properties that are common to all phys are decoded and returned in a
structure. The fdt node handles for the mac and phy devices are also
returned in the config data struct, so a driver can easily obtain additional
hardware-specific config values from the fdt data.

commit | commitdiff | tree

Ian Lepore [Sat, 10 Jun 2017 23:55:13 +0000 (23:55 +0000)]

Add a set of constants describing the ways a MAC and PHY can be connected.

While the initial need for this is to help support phy drivers which are
configured with FDT data, there is nothing devicetree-specific about the
concept or the names, so they are available for use even on non-FDT systems.

The initial list of connection types comes from the current devicetree
bindings documentation, but values not documented there can be added to
the list in the future as needed, the values could be sorted into a
different order without perturbing FDT code, etc. The only invariant
is that MII_CONTYPE_UNKNOWN should be first (so it has a value of zero,
so that a con-type variable in a softc, for example, is initialized to
MII_CONTYPE_UNKNOWN by default).

commit | commitdiff | tree

Ian Lepore [Sat, 10 Jun 2017 23:45:26 +0000 (23:45 +0000)]

Allow building if_ffec as a module.

commit | commitdiff | tree

Ian Lepore [Sat, 10 Jun 2017 23:26:25 +0000 (23:26 +0000)]

if_ffec bugfixes related to harvesting of hardware-maintained statistics...

After harvesting the hardware statistics counters and summing them into the
interface stats, properly clear the hardware counters back to zero.  On imx5
and earlier hardware it is necessary to disable collection of stats while
writing zeroes to all the registers.  On imx6 and newer it turns out it's
not even possible to write zeroes, instead you have to toggle a special
"zero everything" control bit in a register.

Count incoming packets with a bad start frame delim as input errors, and
incoming packets dropped due to no fifo space as input drops.

Remove all code related to harvesting the hardware stats less often than
once per second.  It turns out the 32-bit stats registers are backed by
16-bit counters under the hood, and they can easily roll over if you only
harvest them once every 3 seconds like the old code was doing.  Now we just
read all the regs once a second.

The combination of not properly zeroing the stats registers and 16-bit
counters sometimes wrapping between harvest calls resulted in basically
unusable statistics before these changes.

commit | commitdiff | tree

Cy Schubert [Sat, 10 Jun 2017 23:16:00 +0000 (23:16 +0000)]

Remove redundant assignment of infile from optarg in loadpoolfile()
which was previously assigned from optarg in the argument list from
main().

commit | commitdiff | tree

Mark Johnston [Sat, 10 Jun 2017 21:13:39 +0000 (21:13 +0000)]

List DTrace provider pages in a single variable.

MFC after: 1 week

commit | commitdiff | tree

Mark Johnston [Sat, 10 Jun 2017 21:07:55 +0000 (21:07 +0000)]

Remove an inaccuracy from socket.2.

SOCK_SEQPACKET is implemented for several protocols.

MFC after: 1 week

commit | commitdiff | tree

Enji Cooper [Sat, 10 Jun 2017 20:56:31 +0000 (20:56 +0000)]

Improve handling with system state

- Always unlink $cmd after exit via END block.
- The tests don't function well if kern.geom.debugflags != 0. Save debugflags,
then restore them at the end of the test.

MFC after: 1 month
Sponsored by: Dell EMC Isilon

commit | commitdiff | tree

George V. Neville-Neil [Sat, 10 Jun 2017 20:50:50 +0000 (20:50 +0000)]

Update the variables as well.

commit | commitdiff | tree

George V. Neville-Neil [Sat, 10 Jun 2017 20:47:37 +0000 (20:47 +0000)]

Update Makefile to contain the new DTrace lockstat manual page.

commit | commitdiff | tree

George V. Neville-Neil [Sat, 10 Jun 2017 20:41:53 +0000 (20:41 +0000)]

Manual page for the DTrace lockstat provider

Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11128

commit | commitdiff | tree

Andriy Gapon [Sat, 10 Jun 2017 20:38:52 +0000 (20:38 +0000)]

fstat: catch up with r318997 and use 64 bits to store fsid

Discussed with: kib

commit | commitdiff | tree

Enji Cooper [Sat, 10 Jun 2017 19:48:45 +0000 (19:48 +0000)]

Don't explicitly get the class to PART in gctl_test_helper.c

This will allow the tool to be used with arbitrary geom(4) classes, like GEOM.

Specify class=PART explicitly in the tester to keep existing behavior.

MFC after: 1 month
Sponsored by: Dell EMC Isilon

commit | commitdiff | tree

Edward Tomasz Napierala [Sat, 10 Jun 2017 19:05:45 +0000 (19:05 +0000)]

Switch the example name for variables controlling loading memory images
in /boot/defaults/loader.conf to something that's actually commonly used,
"mdroot". It's arbitrary, but it's easier to find this way.

MFC after: 2 weeks

commit | commitdiff | tree

Eric Joyner [Sat, 10 Jun 2017 18:56:30 +0000 (18:56 +0000)]

ixl(4)/ixlv(4): Fix some busdma tags and improper map NULL.

Description from Brett:

"The busdma tags used to create mappings for the tx and rx rings did not have
the device's tag as parents, meaning that they did not respect the device's
busdma properties. The other tags used in the driver had their parents set
appropriately.

Also, the dma maps for each buffer in ixl_txeof() were being NULLed after
being unloaded, which is an error because those maps are then reused without
being recreated (I believe this also leaked resources since the maps were not
destroyed). Simply removing the line that sets the maps to NULL gives the
desired behavior. There does not seem to be a similar problem with ixl_rxeof().
Functions to free the tx and rx rings also NULL out the dma maps for each
buffer, but this seems okay because the maps are destroyed and not reused in
this case.

With these fixes, my ixl card seems to be working with the IOMMU enabled."

Submitted by: Brett Gutstein <bgutstein@rice.edu>
Reviewed by: erj
Approved by: Alan Cox <alc@rice.edu>
MFC after: 1 week

commit | commitdiff | tree

Dimitry Andric [Sat, 10 Jun 2017 18:52:13 +0000 (18:52 +0000)]

Remove a few unneeded files from libllvm, libclang and liblldb.

MFC after: 3 days

commit | commitdiff | tree

Cy Schubert [Sat, 10 Jun 2017 17:05:14 +0000 (17:05 +0000)]

Disable the -O (output fields) option in poollist() (ippool -l) for
now. The option does not presently work. However, similar functions in
ipfstat (for state) and ipnat (for nat) do work and provide outputs that
can be easily parsed by shell scripts or subsequently loaded into CSV
files. The intention here is to return to this option to make it work.
I suspect the problem is in printpoolfields.c.

commit | commitdiff | tree

Cy Schubert [Sat, 10 Jun 2017 16:42:39 +0000 (16:42 +0000)]

Flag poollist() (ippool -l) command line syntax errors.

commit | commitdiff | tree

Alan Cox [Sat, 10 Jun 2017 16:11:39 +0000 (16:11 +0000)]

Remove an unnecessary field from struct blist.  (The comment describing
what this field represented was also inaccurate.)  Suggested by: kib

In r178792, blist_create() grew a malloc flag, allowing M_NOWAIT to be
specified.  However, blist_create() was not modified to handle the
possibility that a malloc() call failed.  Address this omission.

Increase the width of the local variable "radix" to 64 bits.  (This
matches the width of the corresponding field in struct blist.)

Reviewed by: kib
MFC after: 6 weeks

commit | commitdiff | tree

Mark Johnston [Sat, 10 Jun 2017 14:47:01 +0000 (14:47 +0000)]

Override the locale so that file lists get a consistent sort order.

Reported by: avg
MFC after: 1 week

commit | commitdiff | tree

Edward Tomasz Napierala [Sat, 10 Jun 2017 08:25:46 +0000 (08:25 +0000)]

Remove mentions of recently removed /usr/share/doc/ subdirectories
from hier(7).

commit | commitdiff | tree

Edward Tomasz Napierala [Sat, 10 Jun 2017 08:08:14 +0000 (08:08 +0000)]

/usr/share/doc/bind is gone since 20040925.

MFC after: 2 weeks

commit | commitdiff | tree

Edward Tomasz Napierala [Sat, 10 Jun 2017 08:01:05 +0000 (08:01 +0000)]

Improve formatting by removing yet another case of '-width ".Pa'.

MFC after: 2 weeks

commit | commitdiff | tree

Edward Tomasz Napierala [Sat, 10 Jun 2017 07:47:21 +0000 (07:47 +0000)]

Remove /usr/include/readline/ from hier(7); it's long gone.

commit | commitdiff | tree

Edward Tomasz Napierala [Sat, 10 Jun 2017 07:43:24 +0000 (07:43 +0000)]

Remove groff(1) leftovers from hier(7).

commit | commitdiff | tree

Andriy Gapon [Sat, 10 Jun 2017 06:13:52 +0000 (06:13 +0000)]

follow up to r319746: add the new test files to the make file

Reported by: markj
MFC after: 2 days
X-MFC with: r319746

commit | commitdiff | tree

John Baldwin [Sat, 10 Jun 2017 01:32:35 +0000 (01:32 +0000)]

Decode arguments to rtprio() and rtprio_thread().

commit | commitdiff | tree

John Baldwin [Sat, 10 Jun 2017 01:32:18 +0000 (01:32 +0000)]

Decode arguments to rtprio_thread() (same as rtprio()).

commit | commitdiff | tree

John Baldwin [Sat, 10 Jun 2017 01:22:40 +0000 (01:22 +0000)]

Decode the 'howto' argument to reboot().

commit | commitdiff | tree

John Baldwin [Sat, 10 Jun 2017 01:20:08 +0000 (01:20 +0000)]

Improve decoding of RB_AUTOBOOT in the 'howto' argument to reboot().

The reboot() system call accepts a mode (RB_AUTOBOOT, RB_HALT, RB_POWEROFF,
or RB_REROOT) as well as zero or more optional flags in 'howto'.
However, RB_AUTOBOOT was only displayed if 'howto' was exactly 0.
Combinations like 'RB_AUTOBOOT | RB_DUMP' were decoded as 'RB_DUMP'.
Instead, imply that RB_AUTOBOOT was specified if none of the other "mode"
flags were specified.

commit | commitdiff | tree

John Baldwin [Sat, 10 Jun 2017 00:53:00 +0000 (00:53 +0000)]

Decode the arguments to quotactl().

commit | commitdiff | tree

John Baldwin [Sat, 10 Jun 2017 00:45:07 +0000 (00:45 +0000)]

Decode the arguments to ptrace().

This does not decode structures returned by ptrace().

commit | commitdiff | tree

John Baldwin [Sat, 10 Jun 2017 00:37:02 +0000 (00:37 +0000)]

Decode arguments to getpriority() and setpriority().

commit | commitdiff | tree

John Baldwin [Sat, 10 Jun 2017 00:35:45 +0000 (00:35 +0000)]

Fix decoding of setpriority() arguments.

The PRIO_* 'which' value is stored in the first argument to setpriority(2),
not the last. While here, decode the arguments to getpriority(2).

commit | commitdiff | tree

Luiz Otavio O Souza [Fri, 9 Jun 2017 20:38:18 +0000 (20:38 +0000)]

Remove an unnecessary variable from the switch softc structure and make the
functions that are used as booleans return real boolean values.

Sponsored by: Rubicon Communications, LLC (Netgate)

commit | commitdiff | tree

Justin Hibbits [Fri, 9 Jun 2017 20:26:42 +0000 (20:26 +0000)]

Follow up r313841 on powerpc

Close a potential race in reading the CPU dtrace flags, where a thread can
start on one CPU, and partway through retrieving the flags be swapped out,
while another thread traps and sets the CPU_DTRACE_NOFAULT. This could
cause the first thread to return without handling the fault.

Discussed with: markj@

commit | commitdiff | tree

Mark Johnston [Fri, 9 Jun 2017 19:57:27 +0000 (19:57 +0000)]

Implement pci_disable_device() in the LinuxKPI.

Submitted by: kmacy
MFC after: 2 weeks

commit | commitdiff | tree

Mark Johnston [Fri, 9 Jun 2017 19:41:12 +0000 (19:41 +0000)]

Augment wait queue support in the LinuxKPI.

In particular:
- Don't evaluate event conditions with a sleepqueue lock held, since such
  code may attempt to acquire arbitrary locks.
- Fix the return value for wait_event_interruptible() in the case that the
  wait is interrupted by a signal.
- Implement wait_on_bit_timeout() and wait_on_atomic_t().
- Implement some functions used to test for pending signals.
- Implement a number of wait_event_*() variants and unify the existing
  implementations.
- Unify the mechanism used by wait_event_*() and schedule() to put the
  calling thread to sleep.

This is required to support updated DRM drivers. Thanks to hselasky for
finding and fixing a number of bugs in the original revision.

Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10986

commit | commitdiff | tree

Alan Cox [Fri, 9 Jun 2017 17:19:27 +0000 (17:19 +0000)]

Style and comment fixes only.

Reviewed by: kib
MFC after: 6 weeks

commit | commitdiff | tree

Alan Cox [Fri, 9 Jun 2017 16:19:24 +0000 (16:19 +0000)]

blist_fill()'s return type is too narrow.  blist_fill() accepts a 64-bit
quantity as the size of the range to fill, but returns a 32-bit quantity
as the number of blocks that were allocated to fill that range.  This
revision corrects that mismatch.  Currently, swaponsomething() limits
the size of a swap area to prevent arithmetic arithmetic overflow in
other parts of the blist allocator.  That limit has also prevented this
type mismatch from causing problems.

Reviewed by: kib, markj
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D11096

commit | commitdiff | tree

Gleb Smirnoff [Fri, 9 Jun 2017 15:54:48 +0000 (15:54 +0000)]

Fix stat(2) on a listening socket.

commit | commitdiff | tree

Andrew Turner [Fri, 9 Jun 2017 15:47:14 +0000 (15:47 +0000)]

Allow the arm64 machine/vfp.h to be included without first including
machine/pcb.h. It he latter is only needed for struct pcb.

commit | commitdiff | tree

Andrew Turner [Fri, 9 Jun 2017 15:37:17 +0000 (15:37 +0000)]

Store the read-only thread pointer when scheduling a new thread. This is
not currently set, however we may wish to set it later.

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:30:41 +0000 (15:30 +0000)]

MFV r319740: 8168 NULL pointer dereference in zfs_create()

illumos/illumos-gate@690031d326342fa4ea28b5e80f1ad6a16281519d
https://github.com/illumos/illumos-gate/commit/690031d326342fa4ea28b5e80f1ad6a16281519d

https://www.illumos.org/issues/8168
  If we manage to export the pool on which we are creating a dataset (filesystem
  or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
  which we never check the return value) we end up dereferencing a NULL pointer
  in libzfs`zpool_close().
  This was discovered on ZFS on Linux. The same issue can be reproduced on
  Illumos running in parallel:
    while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
    while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
  Eventually this will result in several core dumps like this one:
  [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
  Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ld.so.1 ]
  > ::stack
  libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
  libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
  zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
  main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
  _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
  >
  Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
MFC after: 2 weeks

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:28:57 +0000 (15:28 +0000)]

MFV r319741: 8156 dbuf_evict_notify() does not need dbuf_evict_lock

illumos/illumos-gate@dbfd9f930004c390a2ce2cf850c71b4f880eef9c
https://github.com/illumos/illumos-gate/commit/dbfd9f930004c390a2ce2cf850c71b4f880eef9c

https://www.illumos.org/issues/8156
  dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do
  the eviction itself (because the evict thread is not able to keep up).
  This can result in massive lock contention.
  It isn't necessary to hold the lock, because if we make the wrong choice
  occasionally, nothing bad will happen.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 1 week

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:27:22 +0000 (15:27 +0000)]

MFV r319739: 8005 poor performance of 1MB writes on certain RAID-Z configurations

illumos/illumos-gate@5b062782532a1d5961c4a4b655906e1238c7c908
https://github.com/illumos/illumos-gate/commit/5b062782532a1d5961c4a4b655906e1238c7c908

https://www.illumos.org/issues/8005
  RAID-Z requires that space be allocated in multiples of P+1 sectors,
  because this is the minimum size block that can have the required amount
  of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2
  sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector
  is a unit of 2^ashift bytes, typically 512B or 4KB.
  To satisfy this constraint, the allocation size is rounded up to the
  proper multiple, resulting in up to 3 "pad sectors" at the end of some
  blocks. The contents of these pad sectors are not used, so we do not
  need to read or write these sectors. However, some storage hardware
  performs much worse (around 1/2 as fast) on mostly-contiguous writes
  when there are small gaps of non-overwritten data between the writes.
  Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
  include pad sectors. If writing a pad sector will fill the gap between
  two (required) writes, we will issue the optional zio, thus doubling
  performance. The gap-filling performance improvement was introduced in
  July 2009.
  Writing the optional zio is done by the io aggregation code in
  vdev_queue.c. The problem is that it is also subject to the limit on
  the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
  default 128KB. For a given block, if the amount of data plus padding
  written to a leaf device exceeds zfs_vdev_aggregation_limit, the
  optional zio will not be written, resulting in a ~2x performance
  degradation.
  The problem occurs only for certain values of ashift, compressed block
  size, and RAID-Z configuration (number of parity and data disks). It
  cannot occur with the default recordsize=128KB. If compression is
  enabled, all configurations with recordsize=1MB or larger will be
  impacted to some degree.
  The problem notably occurs with recordsize=1MB, compression=off, with 10
  disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 10 days

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:26:03 +0000 (15:26 +0000)]

MFV r319738: 8155 simplify dmu_write_policy handling of pre-compressed buffers

illumos/illumos-gate@adaec86ad212d9fd756bee322934fa54d1258605
https://github.com/illumos/illumos-gate/commit/adaec86ad212d9fd756bee322934fa54d1258605

https://www.illumos.org/issues/8155
  When writing pre-compressed buffers, arc_write() requires that the compression
  algorithm used to compress the buffer matches the compression algorithm
  requested by the zio_prop_t, which is set by dmu_write_policy().
  This makes dmu_write_policy() and its callers a bit more complicated.
  We can simplify this by making arc_write() trust the caller to supply the type
  of pre-compressed buffer that it wants to write, and override the compression
  setting in the zio_prop_t.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 10 days

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:21:28 +0000 (15:21 +0000)]

remove an unrelated local change from r319746

MFC after: 1 day
X-MFC with: r319746

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:16:39 +0000 (15:16 +0000)]

MFV r319744,r319745: 8269 dtrace stddev aggregation is normalized incorrectly

illumos/illumos-gate@79809f9cf402f130667349b2d4007ecd65d63c6f
https://github.com/illumos/illumos-gate/commit/79809f9cf402f130667349b2d4007ecd65d63c6f

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

FreeBSD note: the actual change was committed in r316853, this commit
adds the test files and record merge information.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 1 week
Sponsored by: Panzura

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:06:50 +0000 (15:06 +0000)]

fix up r319744, add new files

8269 dtrace stddev aggregation is normalized incorrectly

illumos/illumos-gate@79809f9cf402f130667349b2d4007ecd65d63c6f
https://github.com/illumos/illumos-gate/commit/79809f9cf402f130667349b2d4007ecd65d63c6f

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:04:10 +0000 (15:04 +0000)]

8269 dtrace stddev aggregation is normalized incorrectly

illumos/illumos-gate@79809f9cf402f130667349b2d4007ecd65d63c6f
https://github.com/illumos/illumos-gate/commit/79809f9cf402f130667349b2d4007ecd65d63c6f

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:03:07 +0000 (15:03 +0000)]

8108 zdb -l fails to read labels 2 and 3

illumos/illumos-gate@22c8b9583d07895c16549075a53668d7bc988cf3
https://github.com/illumos/illumos-gate/commit/22c8b9583d07895c16549075a53668d7bc988cf3

https://www.illumos.org/issues/8108

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:02:07 +0000 (15:02 +0000)]

8056 zfs send size estimate is inaccurate for some zvols

illumos/illumos-gate@0255edcc85fc0cd1dda0e49bcd52eb66c06a1b16
https://github.com/illumos/illumos-gate/commit/0255edcc85fc0cd1dda0e49bcd52eb66c06a1b16

https://www.illumos.org/issues/8056
  The send size estimate for a zvol can be too low, if the size of the record
  headers (dmu_replay_record_t's) is a significant portion of the size.
  This is typically the case when the data is highly compressible, especially
  with embedded blocks.
  The problem is that dmu_adjust_send_estimate_for_indirects() assumes that
  blocks are the size of the "recordsize" property (128KB).
  However, for zvols, the blocks are the size of the "volblocksize" property
  (8KB). Therefore, we estimate that there will be 16x less record headers than
  there really will be.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:01:18 +0000 (15:01 +0000)]

8156 dbuf_evict_notify() does not need dbuf_evict_lock

illumos/illumos-gate@dbfd9f930004c390a2ce2cf850c71b4f880eef9c
https://github.com/illumos/illumos-gate/commit/dbfd9f930004c390a2ce2cf850c71b4f880eef9c

https://www.illumos.org/issues/8156
  dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do
  the eviction itself (because the evict thread is not able to keep up).
  This can result in massive lock contention.
  It isn't necessary to hold the lock, because if we make the wrong choice
  occasionally, nothing bad will happen.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 15:00:13 +0000 (15:00 +0000)]

8168 NULL pointer dereference in zfs_create()

illumos/illumos-gate@690031d326342fa4ea28b5e80f1ad6a16281519d
https://github.com/illumos/illumos-gate/commit/690031d326342fa4ea28b5e80f1ad6a16281519d

https://www.illumos.org/issues/8168
  If we manage to export the pool on which we are creating a dataset (filesystem
  or zvol) between entering libzfs`zfs_create() and libzfs`zpool_open() call (for
  which we never check the return value) we end up dereferencing a NULL pointer
  in libzfs`zpool_close().
  This was discovered on ZFS on Linux. The same issue can be reproduced on
  Illumos running in parallel:
    while :; do zpool import -d /tmp testpool ; zpool export testpool ; done
    while :; do zfs create testpool/fs; zfs destroy testpool/fs ; done
  Eventually this will result in several core dumps like this one:
  [root@52-54-00-d3-7a-01 /cores]# mdb core.zfs.4244
  Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ld.so.1 ]
  > ::stack
  libzfs.so.1`zpool_close+0x17(0, 0, 0, 8047450)
  libzfs.so.1`zfs_create+0x1bb(8090548, 8047e6f, 1, 808cba8)
  zfs_do_create+0x545(2, 8047d74, 80778a0, 801, 0, 3)
  main+0x22c(8047d2c, fef5c6e8, 8047d64, 8055a17, 3, 8047d70)
  _start+0x83(3, 8047e64, 8047e68, 8047e6f, 0, 8047e7b)
  >
  Fix and reproducer (systemtap): https://github.com/zfsonlinux/zfs/pull/6096

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 14:58:51 +0000 (14:58 +0000)]

8005 poor performance of 1MB writes on certain RAID-Z configurations

illumos/illumos-gate@5b062782532a1d5961c4a4b655906e1238c7c908
https://github.com/illumos/illumos-gate/commit/5b062782532a1d5961c4a4b655906e1238c7c908

https://www.illumos.org/issues/8005
  RAID-Z requires that space be allocated in multiples of P+1 sectors,
  because this is the minimum size block that can have the required amount
  of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2
  sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector
  is a unit of 2^ashift bytes, typically 512B or 4KB.
  To satisfy this constraint, the allocation size is rounded up to the
  proper multiple, resulting in up to 3 "pad sectors" at the end of some
  blocks. The contents of these pad sectors are not used, so we do not
  need to read or write these sectors. However, some storage hardware
  performs much worse (around 1/2 as fast) on mostly-contiguous writes
  when there are small gaps of non-overwritten data between the writes.
  Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
  include pad sectors. If writing a pad sector will fill the gap between
  two (required) writes, we will issue the optional zio, thus doubling
  performance. The gap-filling performance improvement was introduced in
  July 2009.
  Writing the optional zio is done by the io aggregation code in
  vdev_queue.c. The problem is that it is also subject to the limit on
  the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
  default 128KB. For a given block, if the amount of data plus padding
  written to a leaf device exceeds zfs_vdev_aggregation_limit, the
  optional zio will not be written, resulting in a ~2x performance
  degradation.
  The problem occurs only for certain values of ashift, compressed block
  size, and RAID-Z configuration (number of parity and data disks). It
  cannot occur with the default recordsize=128KB. If compression is
  enabled, all configurations with recordsize=1MB or larger will be
  impacted to some degree.
  The problem notably occurs with recordsize=1MB, compression=off, with 10
  disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 14:57:45 +0000 (14:57 +0000)]

8155 simplify dmu_write_policy handling of pre-compressed buffers

illumos/illumos-gate@adaec86ad212d9fd756bee322934fa54d1258605
https://github.com/illumos/illumos-gate/commit/adaec86ad212d9fd756bee322934fa54d1258605

https://www.illumos.org/issues/8155
  When writing pre-compressed buffers, arc_write() requires that the compression
  algorithm used to compress the buffer matches the compression algorithm
  requested by the zio_prop_t, which is set by dmu_write_policy().
  This makes dmu_write_policy() and its callers a bit more complicated.
  We can simplify this by making arc_write() trust the caller to supply the type
  of pre-compressed buffer that it wants to write, and override the compression
  setting in the zio_prop_t.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 14:56:17 +0000 (14:56 +0000)]

6939 add sysevents to zfs core for commands

illumos/illumos-gate@ce1577b04976f1d8bb5f235b6eaaab15b46a3068
https://github.com/illumos/illumos-gate/commit/ce1577b04976f1d8bb5f235b6eaaab15b46a3068

https://www.illumos.org/issues/6939
  Originally created https://smartos.org/bugview/OS-4489
       sysevents should be fired in the kernel from ZFS whenever a command
       is run that is logged in zpool history.
  Example output
  Terminal 1
  root - gz sunos ~ # zfs create zones/foobar
  root - gz sunos ~ # zfs set quota=10g zones/foobar
  root - gz sunos ~ # zfs destroy zones/foobar
  Terminal 2
  root - gz sunos ~ # sysevent EC_zfs
  nvlist version: 0
      date = 2016-04-28T14:50:08.964Z
      vendor = SUNW
      publisher = zfs
      class = EC_zfs
      subclass = ESC_ZFS_history_event
      pid = 0
      data = (embedded nvlist)
      nvlist version: 0
          pool_name = zones
          pool_guid = 0x40c964e8f9a7a694
          history_record = (embedded nvlist)
          nvlist version: 0
              dsname = zones/foobar
              dsid = 0x1525
              history internal str =
              internal_name = create
              history txg = 0x4c4ef3

Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Joshua M. Clulow <jmc@joyent.com>
Reviewed by: Josh Wilsdon <jwilsdon@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Alan Somers <asomers@gmail.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Dave Eddy <dave@daveeddy.com>

commit | commitdiff | tree

Andriy Gapon [Fri, 9 Jun 2017 14:55:12 +0000 (14:55 +0000)]

6396 remove SVM

illumos/illumos-gate@5f10ef697f250374b7b917e10961c4e02d4e3112
https://github.com/illumos/illumos-gate/commit/5f10ef697f250374b7b917e10961c4e02d4e3112

https://www.illumos.org/issues/6396
LVM = SVM = Solaris Volume Manager
dead code and not using with ZFS based platform.

Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

commit | commitdiff | tree

Konstantin Belousov [Fri, 9 Jun 2017 12:06:22 +0000 (12:06 +0000)]

Remove msdosfs -o large support.

Its purpose was to translate the values for msdosfs inode numbers,
which is calculated from the msdosfs structures describing the file,
into the range representable by 32bit ino_t.  The translation acted
for filesystems larger than 128Gb, it reserved the range 0xf0000000
(FILENO_FIRST_DYN) to UINT32_MAX and remembered some arbitrary
translation of ino >= FILENO_FIRST_DYN into this range.  It consumed
memory that could be only freed by unmount, and the translation was
not stable across remounts.

With ino_t type extended to 64 bit, there is no such issue and values
can be returned without compaction to 32bit.  That is, for the native
environments, the translation layer is not necessary and adds
significant undeserved code complexity.  For compat ABIs which use
32bit ino_t, the vfs.ino64_trunc_error sysctl provides some measures
to soften the failure mode when inode numbers truncation is not safe.

Discussed with: bde
Sponsored by: The FreeBSD Foundation

commit | commitdiff | tree

Konstantin Belousov [Fri, 9 Jun 2017 11:17:08 +0000 (11:17 +0000)]

Enhance vfs.ino64_trunc_error sysctl.

Provide a new mode "2" which returns a special overflow indicator in
the non-representable field instead of the silent truncation (mode
"0") or EOVERFLOW (mode "1").

In particular, the typical use of st_ino to detect hard links with
mode "2" reports false positives, which might be more suitable for
some uses.

Discussed with: bde
Sponsored by: The FreeBSD Foundation

commit | commitdiff | tree

Andriy Voskoboinyk [Fri, 9 Jun 2017 07:08:58 +0000 (07:08 +0000)]

rtwn: rename module (if_rtwn.ko -> rtwn.ko) to match module name + drop
manpage link.

Reported by: mav, hselasky

commit | commitdiff | tree

Phil Shafer [Fri, 9 Jun 2017 03:32:49 +0000 (03:32 +0000)]

Import libxo-0.8.1 with official fix to today's build break.

Submitted by: phil

commit | commitdiff | tree

Phil Shafer [Fri, 9 Jun 2017 03:30:07 +0000 (03:30 +0000)]

Import libxo 0.8.1

commit | commitdiff | tree

John Baldwin [Thu, 8 Jun 2017 21:34:54 +0000 (21:34 +0000)]

Add the ccr0 device to the opencrypto tests against the NIST KAT tests.

The ccr0 device supports both AES and SHA tests.

Sponsored by: Chelsio Communications

commit | commitdiff | tree

Gleb Smirnoff [Thu, 8 Jun 2017 21:33:19 +0000 (21:33 +0000)]

When we are in UMA_STARTUP use startup_alloc() for any zone, not for
internal zones only. This allows to create new zones at early stages
of boot, without need to mark them as internal to UMA, which isn't
always true.

Reviewed by: alc

commit | commitdiff | tree

John Baldwin [Thu, 8 Jun 2017 21:33:10 +0000 (21:33 +0000)]

Fix the software fallback for GCM to validate the existing tag for decrypts.

Sponsored by: Chelsio Communications

commit | commitdiff | tree

Gleb Smirnoff [Thu, 8 Jun 2017 21:30:34 +0000 (21:30 +0000)]

Listening sockets improvements.

o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9770

commit | commitdiff | tree

John Baldwin [Thu, 8 Jun 2017 21:06:18 +0000 (21:06 +0000)]

Add explicit handling for requests with an empty payload.

- For HMAC requests, construct a special input buffer to request an empty
  hash result.
- For plain cipher requests and requests that chain an AES cipher with an
  HMAC, fail with EINVAL if there is no cipher payload.  If needed in
  the future, chained requests that only contain AAD could be serviced as
  HMAC-only requests.
- For GCM requests, the hardware does not support generating the tag for
  an AAD-only request.  Instead, complete these requests synchronously
  in software on the assumption that such requests are rare.

Sponsored by: Chelsio Communications

commit | commitdiff | tree

Jonathan T. Looney [Thu, 8 Jun 2017 20:47:18 +0000 (20:47 +0000)]

With EARLY_AP_STARTUP enabled, we are seeing crashes in softclock_call_cc()
during bootup. Debugging information shows that softclock_call_cc() is
trying to execute the vt_consdev.vd_timer callout, and the callout
structure contains a NULL c_func.

This appears to be due to a race between vt_upgrade() running
callout_reset() and vt_resume_flush_timer() calling callout_schedule().

Fix the race by ensuring that vd_timer_armed is always set before
attempting to (re)schedule the callout.

Discussed with: emaste
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9828

commit | commitdiff | tree

Jonathan T. Looney [Thu, 8 Jun 2017 20:41:28 +0000 (20:41 +0000)]

Add the infrastructure to support loading multiple versions of TCP
stack modules.

It adds support for mangling symbols exported by a module by prepending
a string to them. (This avoids overlapping symbols in the kernel linker.)

It allows the use of a macro as the module name in the DECLARE_MACRO()
and MACRO_VERSION() macros.

It allows the code to register stack aliases (e.g. both a generic name
["default"] and version-specific name ["default_10_3p1"]).

With these changes, it is trivial to compile TCP stack modules with
the name defined in the Makefile and to load multiple versions of the
same stack simultaneously. This functionality can be used to enable
side-by-side testing of an old and new version of the same TCP stack.
It also could support upgrading the TCP stack without a reboot.

Reviewed by: gnn, sjg (makefiles only)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D11086

commit | commitdiff | tree

Ed Maste [Thu, 8 Jun 2017 20:06:09 +0000 (20:06 +0000)]

arm64: add ".arch armv8-a+crc" to allow use of crc instructions

With Clang 5.0 the .arch directive is required, otherwise Clang
complains "error: instruction requires: crc".

This was reported in D10499 but not added initially, because clang 3.8
available on a ref machine reported unknown directive. Clang 4.0 allows
but does not require the directive.

Submitted by: andrew
MFC after: 1 week
Sponsored by: The FreeBSD Foundation

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom