Alan Cox [Sun, 24 Sep 2017 23:35:01 +0000 (23:35 +0000)]
Change vm_page_try_to_free() to require a managed page. Essentially,
vm_page_try_to_free() is testing conditions, like clean versus dirty,
that only vary in managed pages.
Suggested by: kib
Reviewed by: markj
X-MFC after: never
Alan Cox [Sun, 24 Sep 2017 22:29:11 +0000 (22:29 +0000)]
Modernize the use of vm_page_unwire(). Since r288122, vm_page_unwire()
has returned TRUE when the wire count transitions to zero, eliminating
the need for callers to inspect the page's wire count.
Rick Macklem [Sun, 24 Sep 2017 20:05:48 +0000 (20:05 +0000)]
Change a panic to an error return.
There was a panic() in the NFS server's write operation that didn't
need to be a panic() and could just be an error return.
This patch makes that change.
Found by code inspection during development of the pNFS service.
g_resize_provider_event: Do not invoke orphan method twice
Like r266444, g_resize_provider_event can attempt to orphan an already
orphaned geom_dev consumer. This will cause a panic in g_dev_orphan. Apply
the same fix as was applied to g_orphan_register.
Rick Macklem [Sun, 24 Sep 2017 19:43:31 +0000 (19:43 +0000)]
Remove 0 filling from nfsm_uiombuflist().
nfsm_uiombuflist() zero filled the mbuf list to a multiple of 4bytes
as required for XDR. Unfortunately that modified an mbuf list after
it was m_copym()'d and was broken. This patch removes the zero filling code.
Since nfsm_uiombuflist() is not yet used in head/current, this has no
effect on users.
The function will be used by a future commit of code that adds Flex
File Layout support.
Alan Cox [Sun, 24 Sep 2017 16:50:10 +0000 (16:50 +0000)]
Optimize vm_page_try_to_free(). Specifically, the call to pmap_remove_all()
can be avoided when the page's containing object has a reference count of
zero. (If the object has a reference count of zero, then none of its pages
can possibly be mapped.)
Address nearby style issues in vm_page_try_to_free(), and change its
return type to "bool".
Andrew Turner [Sun, 24 Sep 2017 09:33:08 +0000 (09:33 +0000)]
Add i.MX6 and Xilinx to GENERIC.
Merge in the missing devices from the IMX6 and ZEDBOARD kernel configs. The
Freescale sdma device has been renamed to fslsdma to mark it as a platform
specific driver.
Reviewed by: ian
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11564
Alan Cox [Sun, 24 Sep 2017 02:50:59 +0000 (02:50 +0000)]
Since the page "frame" doesn't belong to a vm object, it can't be paged
out. Since it can't be paged out, it is never actually enqueued in a
paging queue. Nonetheless, passing PQ_INACTIVE to vm_page_unwire()
creates the appearance that the page "frame" is being enqueued in the
inactive queue. As of r288122, we can avoid this false impression by
passing PQ_NONE.
Enji Cooper [Sun, 24 Sep 2017 00:14:48 +0000 (00:14 +0000)]
Convert some idioms over to py3k-compatible idioms
- Import print_function from __future__ and use print(..) instead of `print ..`.
- Use repr instead of backticks when the object needs to be dumped, unless
print(..) can do it lazily. Use str instead of backticks as appropriate
for simplification reasons.
This doesn't fully convert these modules over py3k. It just gets over some of
the trivial compatibility hurdles.
Stephen Hurd [Sat, 23 Sep 2017 16:59:37 +0000 (16:59 +0000)]
bnxt: Choose better HW LRO defaults for performance
1) Choose correct Firmware options for HW LRO for best performance
2) Delete TBD and other comments which are not required.
3) Added sysctl interface to enable / disable / modify different factors
of HW LRO.
4) Disabled HW LRO by default to avoid issues with packet forwarding
This allows much better control over the LRO configuration via sysctls, and
uses much better defaults. Hardware LRO can now be enabled/disabled
independantly from the software LRO, and the tuning parameters are exposed.
Stephen Hurd [Sat, 23 Sep 2017 01:39:16 +0000 (01:39 +0000)]
Make struct grouptask gt_name member a char array
Previously, it was just a pointer which was copied, but
some callers pass in a stack variable which will go out of scope.
Add GROUPTASK_NAMELEN macro (32) and snprintf() the name into it,
using "grouptask" if name is NULL. We can now safely include
gtask->gt_name in console messages.
Stephen Hurd [Sat, 23 Sep 2017 01:33:20 +0000 (01:33 +0000)]
Some small packet performance improvements
If the packet is smaller than MTU, disable the TSO flags.
Move TCP header parsing inside the IS_TSO?() test.
Add a new IFLIB_NEED_ZERO_CSUM flag to indicate the checksums need to be zeroed before TX.
Continuing efforts to provide hardening of FFS, this change adds a
check hash to cylinder groups. If a check hash fails when a cylinder
group is read, no further allocations are attempted in that cylinder
group until it has been fixed by fsck. This avoids a class of
filesystem panics related to corrupted cylinder group maps. The
hash is done using crc32c.
Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily
used in embedded systems with small memories and low-powered processors
which need as light-weight a filesystem as possible.
Specifics of the changes:
sys/sys/buf.h:
Add BX_FSPRIV to reserve a set of eight b_xflags that may be used
by individual filesystems for their own purpose. Their specific
definitions are found in the header files for each filesystem
that uses them. Also add fields to struct buf as noted below.
sys/kern/vfs_bio.c:
It is only necessary to compute a check hash for a cylinder
group when it is actually read from disk. When calling bread,
you do not know whether the buffer was found in the cache or
read. So a new flag (GB_CKHASH) and a pointer to a function to
perform the hash has been added to breadn_flags to say that the
function should be called to calculate a hash if the data has
been read. The check hash is placed in b_ckhash and the B_CKHASH
flag is set to indicate that a read was done and a check hash
calculated. Though a rather elaborate mechanism, it should
also work for check hashing other metadata in the future. A
kernel internal API change was to change breada into a static
fucntion and add flags and a function pointer to a check-hash
function.
sys/ufs/ffs/fs.h:
Add flags for types of check hashes; stored in a new word in the
superblock. Define corresponding BX_ flags for the different types
of check hashes. Add a check hash word in the cylinder group.
sys/ufs/ffs/ffs_alloc.c:
In ffs_getcg do the dance with breadn_flags to get a check hash and
if one is provided, check it.
sys/ufs/ffs/ffs_vfsops.c:
Copy across the BX_FFSTYPES flags in background writes.
Update the check hash when writing out buffers that need them.
sys/ufs/ffs/ffs_snapshot.c:
Recompute check hash when updating snapshot cylinder groups.
sys/libkern/crc32.c:
lib/libufs/Makefile:
lib/libufs/libufs.h:
lib/libufs/cgroup.c:
Include libkern/crc32.c in libufs and use it to compute check
hashes when updating cylinder groups.
Four utilities are affected:
sbin/newfs/mkfs.c:
Add the check hashes when building the cylinder groups.
sbin/fsck_ffs/fsck.h:
sbin/fsck_ffs/fsutil.c:
Verify and update check hashes when checking and writing cylinder groups.
sbin/fsck_ffs/pass5.c:
Offer to add check hashes to existing filesystems.
Precompute check hashes when rebuilding cylinder group
(although this will be done when it is written in fsutil.c
it is necessary to do it early before comparing with the old
cylinder group)
sbin/dumpfs/dumpfs.c
Print out the new check hash flag(s)
sbin/fsdb/Makefile:
Needs to add libufs now used by pass5.c imported from fsck_ffs.
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().
Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.
Fix this by taking a range lock on the whole block in zvol_get_data().
Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: LOLi <loli10K@users.noreply.github.com>
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().
Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.
Fix this by taking a range lock on the whole block in zvol_get_data().
Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: LOLi <loli10K@users.noreply.github.com>
https://www.illumos.org/issues/8661
The "zil-cw1" dtrace probe was previously removed in 8558, and the "zil-cw2"
probe should have been removed in that patch as well. Unfortunately, the "zil-
cw2" was not removed in 8558, so this bug is to track it's removal.
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
https://www.illumos.org/issues/8661
The "zil-cw1" dtrace probe was previously removed in 8558, and the "zil-cw2"
probe should have been removed in that patch as well. Unfortunately, the "zil-
cw2" was not removed in 8558, so this bug is to track it's removal.
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
https://www.illumos.org/issues/8600
ZFS channel programs should be able to create snapshots.
In addition to the base snapshot functionality, this will likely entail adding
extra logic to handle edge cases which were formerly not possible, such as
creating then destroying a snapshot in the same transaction sync.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
https://www.illumos.org/issues/8592
ZFS channel programs should be able to perform a rollback. This logic will
probably look pretty similar to zfs.sync.destroy().
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brad Lewis <brad.lewis@delphix.com>
https://www.illumos.org/issues/8502
The code in lib/libzfs/common/libzfs_mount.c already basically handles
the case when libshare is not installed. We just need to not fail in
zfs_init_libshare_impl. I tested this in lx and things work as
expected. I also tested there trying to set sharenfs and sharesmb on
the delegated dataset. Neither is allowed from within a zone. The
spew of msgs from a native zone is not ZFS specific. I see the same
spew simply running the share command.
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
Toomas Soome [Fri, 22 Sep 2017 07:29:26 +0000 (07:29 +0000)]
libefi: pdinfo_t pd_unit and pd_open should be unsigned
The device index, partition index and reference counter are all positive
numbers. However, since our internal partition number may be negative
to indicate GPT table, the compare expression need to take care when comparing
pdinfo_t and partition data.
John Baldwin [Fri, 22 Sep 2017 00:34:46 +0000 (00:34 +0000)]
Support AEAD requests with non-GCM algorithms.
In particular, support chaining an AES cipher with an HMAC for a request
including AAD. This permits submitting requests from userland to encrypt
objects like IPSec packets using these algorithms.
In the non-GCM case, the authentication crypto descriptor covers both the
AAD and the ciphertext. The GCM case remains unchanged. This matches
the requests created internally in IPSec. For the non-GCM case, the
COP_F_CIPHER_FIRST is also supported since the ordering matters.
Note that while this can be used to simulate IPSec requests from userland,
this ioctl cannot currently be used to perform TLS requests using AES-CBC
and MAC-before-encrypt.
Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D11759
John Baldwin [Fri, 22 Sep 2017 00:15:54 +0000 (00:15 +0000)]
Place the AAD before the plaintext/ciphertext for CIOCRYPTAEAD.
Software crypto implementations don't care how the buffer is laid out,
but hardware implementations may assume that the AAD is always before
the plain/cipher text and that the hash/tag is immediately after the end
of the plain/cipher text.
In particular, this arrangement matches the layout of both IPSec packets
and TLS frames. Linux's crypto framework also assumes this layout for
AEAD requests.
Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D11758
Warner Losh [Thu, 21 Sep 2017 23:10:56 +0000 (23:10 +0000)]
Always create usr/local/etc -> /etc/local symlink
/usr/local/etc gets created and populated by packages. However, if no
packages are installed when setup_nanobsd is run, this symlink won't
get created, causing problems if packages are installed later (say on
first boot). Therefore, always create the symlink and etc/local. It
does no harm and may help.
Inspired by crochet issue #183 (consuingly says NanoBSD, means crochet)
Sponsored by: Netflix
John Baldwin [Thu, 21 Sep 2017 23:05:32 +0000 (23:05 +0000)]
Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.
Move handling of these three pathconf() variables out of vop_stdpathconf()
and into devfs_pathconf() as TTY devices can only be devfs files. In
addition, only return settings for these three variables for devfs devices
whose device switch has the D_TTY flag set.
Stephen Hurd [Thu, 21 Sep 2017 21:14:48 +0000 (21:14 +0000)]
Improved logging of gtaskqueue failues
Check the return code of intr_setaffinity() and log any errors
it returns. When a qid is not located, log an error before returning
failure. Also, use __func__ rather than hardcoding the function name
cryptotest.py: Actually use NIST-KAT HMAC test vectors and test the right hashes
Previously, this test was entirely a no-op as no vector in the NIST-KAT file
has a precisely 20-byte key.
Additionally, not every vector in the file is SHA1. The length field
determines the hash under test, and is now decoded correctly.
Finally, due to a limitation I didn't feel like fixing in cryptodev.py, MACs
are truncated to 16 bytes in this test.
With this change and the uncommitted D12437 (to allow key sizes other than
those used in IPSec), the SHA tests in cryptotest.py actually test something
and e.g. at least cryptosoft passes the test.
Stephen Hurd [Thu, 21 Sep 2017 20:34:33 +0000 (20:34 +0000)]
Fix M_GTASKQUEUE definition
Previously had the same short and long description as taskqueues.
This could cause problems with memguard(9) and vmstat -m which use
the short description as a unique identifier.
Stephen Hurd [Thu, 21 Sep 2017 20:27:43 +0000 (20:27 +0000)]
bnxt: Fix driver when attached to a VF
- Use HWRM_FUNC_VF_CFG instead of HWRM_FUNC_CFG on VFs
- Fix NPAR/VF detection
- Clean up flag definitions
- Don't allow WoL on VFs
Although the bnxt driver doesn't support SR-IOV so can create VFs yet,
the PF could be running Linux or ESCi with a VF passed through to a
FreeBSD guest. This fixes the driver for that use case.
cryptotest.py: Do not run AES-CBC or AES-GCM tests on non-AES crypto(4) drivers
For some reason, we only skipped AES-XTS tests if a driver was not in the
aesmodules list. Skip other AES modes as well to prevent spurious failures
in non-AES drivers.
Alan Cox [Thu, 21 Sep 2017 15:32:41 +0000 (15:32 +0000)]
Modernize calls to vm_page_unwire(). As of r288122, vm_page_unwire()
accepts PQ_NONE as the specified queue and returns a Boolean indicating
whether the page's wire count transitioned to zero. Use these features
in dev/drm2.
Ammend bin/cat/cat.c so the output is the same aside
from blank lines being numbered or unnumbered, depending on whether cat
was invoked with -ne or -be.
At present, when cat is invoked with -be, there is an aditional
difference that the '$' on blank lines is placed on the far left of the
output.
Discussed in bug 210607.
While here, revert the workaround from r304035 which skipped the unit test for
this issue previously.
IMHO it is possible that failure will be treated as success because we don't
initialize nvp on every loop iteration and the code under 'fail'(!) label
detects success by checking of nvp != NULL.
Nick Hibma [Thu, 21 Sep 2017 10:13:48 +0000 (10:13 +0000)]
Remove an 'unused' function.
This function was only set in legacy.sh and only at the very end after
the disk image had been successfully created. The only difference will be
that the message 'Error encountered. Please check...' will not appear if
nanobsd.sh exits with an error after the disk image has been created.
Because nvp wasn't initialized on every loop iteration once we jumped
to 'fail' on error it was treated as success, because nvp!=NULL. Fix this
by not handling success under 'fail' label and by using separate variable
for parent nvpair.
If we succeeded to allocate nvlist, but failed to allocated nvpair we
would leak nvls[ii] on return. Destroy it when we cannot allocate nvpair,
before we goto fail.
Submitted by: pjd@ and oshogbo@ (minor changes)
Found by: scan-build
MFC after: 1 month
Sponsored by: Wheel Systems
The 'while (array != NULL) { }' suggests scan-build that array may be
initially NULL, which is not possible. Change the loop to
'do {} while (array != NULL)' to satisfy scan-build and assert that
array really cannot be NULL just in case.
Submitted by: pjd@
Found by: scan-build
MFC after: 1 month
Sponsored by: Wheel Systems