pfg [Mon, 5 Feb 2018 15:02:35 +0000 (15:02 +0000)]
MFC r328567:
libedit: sort the Makefile in line with NetBSD's version.
NetBSD's libedit has been been cleaned-up considerably so the
non-widecharacter version is no longer an option. Re-sorting the
Makefile should make it easier for some brave soul trying to update it.
MFC r328770:
Merge r1.120 from NetBSD:
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
Reported by: Maxime Villard <max at m00nbsd dot net>
kevans [Mon, 5 Feb 2018 04:00:59 +0000 (04:00 +0000)]
MFC r304321,304753,304754,306751,316077,316110:
SHA512, skein, large block support for loader zfs
MFC r304321: Add SHA512, skein, large blocks support for loader zfs.
MFC r304753: loader: zio_checksum_verify() must test spa for NULL pointer
MFC r304754: r304321 broken bhyve zvol VM bhyveload hang 100% WCPU
MFC r306751: Disable loop unrolling in skein for sys/boot
MFC r316077: Unbreak compilation with gcc 4.2.1
MFC r316110: Use `-Wno-missing-declarations` with CWARNFLAGS for skein.c
mav [Fri, 2 Feb 2018 23:22:58 +0000 (23:22 +0000)]
MFC r303468 (by imp):
Move protocol specific stuff into a linker set object that's
per-protocol. This reduces the number scsi symbols references by
cam_xpt significantly, and eliminates all ata / nvme symbols. There's
still some NVME / ATA specific code for dealing with XPT_NVME_IO and
XPT_ATA_IO respectively, and a bunch of scsi-specific code, but this
is progress.
mav [Fri, 2 Feb 2018 23:19:20 +0000 (23:19 +0000)]
MFC r303467 (by imp):
Switch to linker sets to find the xport callback object. This
eliminates the need to special case everything in cam_xpt for new
transports. It is now a failure to not have a transport object when
registering the bus as well. You can still, however, create a
transport that's unspecified (XPT_)
r306395 by br:
Increase timeouts for geli tests. It takes 2-3x more time to proceed the
tests on MIPS64EB in QEMU.
Sponsored by: DARPA, AFRL
Sponsored by: HEIF5
r327346:
Fix potential TOCTTOU bug in the geli tests
This change mostly reverts r293436, which introduced the bug due to a belief
that geli(8) would allocate md(4) devices by itself. However, that belief is
incorrect. Instead of using linear probing to find available md(4) numbers,
it's best to use the existing attach_md function.
r327347:
geli: factor out some common code in the geli tests
No functional change.
Sponsored by: Spectra Logic Corp
r327352:
Fix a harmless typo from r310786
I copy/pasted a reference to an undefined shell variable.
r327353:
geli: fix the resize test on arm64
The resize test used bsdlabel(8), which is not available on all
architectures. Change it to use gpart(8) instead, which should be available
everywhere.
PR: 221763
Reported by: andrew
r327662:
geli: convert most tests from TAP to ATF
I'm leaving readonly_test and nokey_test alone for now. In a future commit
they should be broken up into several smaller test cases and distributed
between multiple files.
The trick is not to destroy an md(4) device during a test. That can create
a "double-free" situation, because we also destroy md devices during test
cleanup.
r327682:
Fix typo from r327666
X-MFC-With: 327666
r327683:
geli: convert remaining TAP tests to ATF
r327685:
geli: optimize tests
Reduce the geli tests' runtime by about a third:
* In integrity_test:copy, use a file-backed md(4) device instead of a
malloc'd one. That way we can corrupt the underlying storage without
needing to detach and reattach the geli device.
* In integrity_test:{copy, hmac, data} and onetime_test:{onetime,
onetime_a}, move reads of /dev/random out of the loop.
kevans [Fri, 2 Feb 2018 21:25:32 +0000 (21:25 +0000)]
MFC r328504: stand/fdt: Consolidate overlay handling a little further
[This is effectively a direct commit to stable/11 due to path restructuring
in HEAD. The diff against HEAD has simply been applied to the old path]
This should have been done as part of r327350, but due to lack of foresight
it came later. In the different places we apply overlays, we duplicate the
bits that check for fdt_overlays in the environment and supplement that with
any other places we need to check for overlays to load. These "other places"
will be loader specific and are not candidates for consolidation.
Provide an fdt_load_dtb_overlays to capture the common logic, allow passing
in an additional list of overlays to be loaded. This additional list of
overlays is used in practice for ubldr to pull in any fdt_overlays passed to
it from U-Boot environment, but it can be used for any other source of
overlays.
These additional overlays supplement loader.conf(5) fdt_overlays, rather
than replace, so that we're not restricted to specifying overlays in only
one place. This is a change from previous behavior where loader.conf(5)
supplied fdt_overlays would cause us to ignore U-Boot environment, and this
seems nonsensical- user should have sufficient control over both of these
aspects, or lack of control for good reasons.
A knob could be considered in the future to ignore U-Boot supplied overlays,
but the supplemental treatment seems like a good start.
mav [Fri, 2 Feb 2018 18:03:14 +0000 (18:03 +0000)]
MFC r307567 (by sbruno): Assert that we're assigning a non-null taskqueue.
ref: https://github.com/NextBSD/NextBSD/commit/535865d02c162e415d7436899cd6db5000a0cc7b
Fix cpu assignment by assuring stride is non-zero, assert that all tasks
have a valid taskqueue.
ref: https://github.com/NextBSD/NextBSD/commit/db398176234fe3ce9f8e8b671f56000f8276feba
New attribute begemotSnmpdCommunityPermission can be used to specify access
rights: 1 means "read-only" access, 2 means "read-write" access. If
attribute is not specified for some index this means "read-only" rights.
Community strings must be unique, i.e. must not be the same for different
indexes.
mav [Thu, 1 Feb 2018 21:23:42 +0000 (21:23 +0000)]
MFC r326937, r326940 (by imp):
When we're disabling the nvme device, some drives have a controller
bug that requires 'hands off' for a period of time (2.3s) before we
check the RDY bit. Sicne this is a very odd quirk for a very limited
selection of drives, do this as a quirk. This prevented a successful
reset of the card when the card wedged.
Also, make sure that we comply with the advice from section 3.1.5 of
the 1.3 spec says that transitioning CC.EN from 0 to 1 when CSTS.RDY
is 1 or transitioning CC.EN from 1 to 0 when CSTS.RDY is 0 "has
undefined results". Short circuit when EN == RDY == desired state.
Finally, fail the reset if the disable fails. This will lead to a
failed device, which is what we want. (note: nda device needs
work for coping with a failed device).
mav [Thu, 1 Feb 2018 21:14:54 +0000 (21:14 +0000)]
MFC r324978: Report only the valid slots in the firmware log page.
Printing the entire log page is causing confusion over available
slots. Report only those slots that are valid. In the case where the
firmware download isn't supported, assume that only the first slot is
valid (I have no hardware to test this assumption though)
mav [Thu, 1 Feb 2018 21:12:09 +0000 (21:12 +0000)]
MFC r323625 (by imp): Allow multiple TRIMs to be done for nda
Don't call cam_iosched_trim_done or cam_iosched_submit_trim for nda
since its hardware can handle almost an arbitrary number of TRIMs and
we don't have to be careful to only ever do one.
mav [Thu, 1 Feb 2018 21:11:17 +0000 (21:11 +0000)]
MFC r322999 (by imp): Fix NVMe's use of XPT_GDEV_TYPE
This patch changes the way XPT_GDEV_TYPE works for NVMe. The current
ccb_getdev structure includes pointers to the NVMe Identify Controller
and Namespace structures, but these are kernel virtual addresses which
are not accessible from user space.
As an alternative, the patch changes the pointers into padding in
ccb_getdev and adds two new types to ccb_dev_advinfo to retrieve the
Identify Controller (CDAI_TYPE_NVME_CNTRL) and Namespace
(CDAI_TYPE_NVME_NS) data structures.
mav [Thu, 1 Feb 2018 21:04:10 +0000 (21:04 +0000)]
MFC r320522 (by imp):
Fix sign of resid and add a mostly useless cast to cope with signed vs
unsigned check warnings from traditional unix code construsts bogusly
flagged as potentially unsafe.
After review by the WDC engineers, improve how we pull down the
so-called 'e6' logs. The 'c6' logs are obsolete and support for them
has been removed because FreeBSD needed to pull them in chunks, which
is incompatible with the 0xc6 opcode implementation. Rather than leave
the code in place that produces bad log pulls, remove it.
mav [Thu, 1 Feb 2018 19:43:51 +0000 (19:43 +0000)]
MFC r314229 (by imp):
Exit with usage if argv[1] is NULL in dispatch. This fixes core dumps
when a command has subcommands, but the user doesn't give the
parameters on the command line.
mav [Thu, 1 Feb 2018 19:41:46 +0000 (19:41 +0000)]
MFC r313259 (by imp):
Use ssize_t instead of uint32_t to prevent warnings about a comparison
with different signs. Due to the promotion rules, this would only
happen on 32-bit platforms.
mav [Thu, 1 Feb 2018 19:40:51 +0000 (19:40 +0000)]
MFC r313258 (by imp):
Add the ability to dump log pages directly in binary to stdout.
Update man page to include this flag, and an example of dumping a
vendor-specific page while I'm here.
mav [Thu, 1 Feb 2018 19:39:29 +0000 (19:39 +0000)]
MFC r313257 (by imp):
Add some descriptions to the man page for the supported log pages as
well as the new wdc commands. Make wdc be an alias for hgst when
specifying the vendor to use to interpret the page.
mav [Thu, 1 Feb 2018 19:37:50 +0000 (19:37 +0000)]
MFC r313191 (by imp):
Implement 5 wdc-specific nvme control options for their HGST drives:
wdc cap-diag Capture diagnostic data from drive
wdc drive-log Capture drive history data from drive
wdc get-crash-dump Retrieve firmware crash dump from drive
mav [Thu, 1 Feb 2018 19:35:34 +0000 (19:35 +0000)]
MFC r313111 (by imp):
Use aligned buffer for the firmware data. Otherwise, when loading a
MAXPHYS bytes of data, the I/O would require MAXPHYS + PAGE_SIZE worth
of pages to do the I/O and we'd hit an assertion in
vm_fault_quick_hold_pages unless MAXPHYS was larger than 1M +
PAGE_SIZE.
mav [Thu, 1 Feb 2018 19:32:45 +0000 (19:32 +0000)]
MFC r309413 (by imp):
Flag the vendor specific pages as such. This allows different decoding
for the same page number as different vendors encode vendor specific
pages differently.
mav [Thu, 1 Feb 2018 19:31:39 +0000 (19:31 +0000)]
MFC r308869 (by imp):
i386 turns out to not have __uint128_t. So confusingly use 64-bit math
instead. Since we're little endian, we can get away with it. Also,
since the counters in quesitons would require billions of iops for
tens of billions of seconds to overflow, and since such data rates are
unlikely for people using i386 for a while, that's OK. The fastest
cards today can't do even a million IOPs.
mav [Thu, 1 Feb 2018 19:30:37 +0000 (19:30 +0000)]
MFC r308856 (by imp):
Decode the Intel-specific Additional SMART data page (0xca) and print
it in human readable form. Include a pointer to the public spec that
was followed to implement this in the code. Samsung also implements
page 0xca on some of their drives, but the format is slighly
different, so the code skips printing zero keys. Samsung's log page
has additional, unknown data after the end of Intel defined data which
isn't displayed.
mav [Thu, 1 Feb 2018 19:30:02 +0000 (19:30 +0000)]
MFC r308848 (by imp):
Remove check for valid log pages. Let the drive tell us which pages
are valid or not. While many pages are reserved in the standard, that
doesn't make them invalid and future versions of the standard may
define then.
mav [Thu, 1 Feb 2018 19:27:10 +0000 (19:27 +0000)]
MFC r303125 (by imp):
Remove some bogus comments and printfs. Also, we can't
cam_periph_releaes_locked() at the end of nvme_probe_start because we
hit an assertion which may be bogus. Instead, leak a periph until we
sort it out. Since these devices don't arrive and depart often, so
this is the lessor of two evils.
mav [Thu, 1 Feb 2018 19:13:19 +0000 (19:13 +0000)]
MFC r324632 (by imp):
Be nicer on the dump stack by allocating only a ccb_nvmeio rather than
a full ccb. This saves a few hundre bytes, which might be important
during a crash dump...
mav [Thu, 1 Feb 2018 19:07:21 +0000 (19:07 +0000)]
MFC r324644 (by imp):
Closer examination shows that nvme and CAM both normally zero-fill
allocations (for req and ccb, which ultimately contain the
nvme_cmd). As such, we can micro-optimize these routines. Add a
comment to this effect, and bzero the ccb used to make the requests
for the nda dump rotuine so it more closely matches a ccb allocated
with xpt_get_ccb().
mav [Thu, 1 Feb 2018 19:06:42 +0000 (19:06 +0000)]
MFC r324634 (by imp):
Use nvme_ctrlr_poll instead of nvme_ctrlr_intx_handler since it is
more general and doesn't try to access registers that may be undefined
when the card is in MSIX mode.
This change, along with r324630, r324631, r324632, makes nda crash
dumps work again. Previously, they only worked on CPU 0 when the stack
garbage was just so.
mav [Thu, 1 Feb 2018 19:05:48 +0000 (19:05 +0000)]
MFC r324633 (by imp):
Create general polling function for the nvme controller. Use it when
we're doing the various pin-based interrupt modes. Adjust
nvme_ctrlr_intx_handler to use nvme_ctrlr_poll.
mav [Thu, 1 Feb 2018 19:04:50 +0000 (19:04 +0000)]
MFC r324631 (by imp):
Explicitly set reserved fields and 'fuse' to 0. This prevents us from
acidentally sending bogus values in these fields, which some drives
may reject with an error or worse (undefined behavior).
This is especially needed for the ndadump routine which allocates the
cmd from stack garbage....
mav [Thu, 1 Feb 2018 19:01:06 +0000 (19:01 +0000)]
MFC r324075 (by imp): Tweak performance of nda completions
Use xpt_done_direct in preference to xpt_done when completing a
successful I/O. Continue to use xpt_done when there's an error, or for
completion of the submission of a CCB. This eliminates a context
switch to the cam_doneq thread.
mav [Thu, 1 Feb 2018 19:00:05 +0000 (19:00 +0000)]
MFC r323834 (by imp): Fix queue depth for nda.
1/4 of the number of queues times queue entries is too limiting. It
works up to about 4k IOPS / 3.0GB/s for hardware that can do
4.4k/3.2GB/s with nvd. 3/4 works better, though it highlights issues
in the fairness of nda's choice of TRIM vs READ. That will be fixed
separately.
mav [Thu, 1 Feb 2018 18:59:03 +0000 (18:59 +0000)]
MFC r322998 (by imp):
Fix a few overlooked spots where the coded uses 16-bit NSIDs. Chuck
Tuffli had submitted a more thorough patch that I was unaware of when
I did my work and this brings in the bits I missed from that patch.
mav [Thu, 1 Feb 2018 16:53:08 +0000 (16:53 +0000)]
MFC r322995 (by imp):
Add new compile-time option NVME_USE_NVD that sets the default value
of the runtime hw.nvme.use_vnd tunable. We still default to nvd unless
otherwise requested.
mav [Thu, 1 Feb 2018 16:52:03 +0000 (16:52 +0000)]
MFC r322994: Set the max transactions for NVMe drives better.
Provided a better estimate for the number of transactions that can be
pending at one time. This will be number of queues * number of
trackers / 4, as suggested by Jim Harris. This gives a better estimate
of the number of transactions that CAM should queue before applying
back pressure. This should be revisted when we have real multi-queue
support in CAM and the upper layers of the I/O stack.
mav [Thu, 1 Feb 2018 16:51:11 +0000 (16:51 +0000)]
MFC r322903 (by imp):
Fill in reserved areas from NVMe spec in the IDENTIFY structure
(struct nvme_controller_data) as defined in the NVM Express
specification, revsion 1.3.
o Automomous Power State Transition
o Host Memory Buffer
o Timestamp
o Keep Alive Timer
o Host Controlled Thermal Management
o Non-Operational Power State Config
Also note that feature codes 0x78-0x7f are reserved for the NVMe
Management Interface.
mav [Thu, 1 Feb 2018 16:45:44 +0000 (16:45 +0000)]
MFC r322872 (by imp):
Enable bus mastering on the device before resetting the device. The
card has to do PCIe transactions to complete the reset process, but
can't do them, per the PCIe spec, unless bus mastering is enabled.
mav [Thu, 1 Feb 2018 16:45:08 +0000 (16:45 +0000)]
MFC r322443 (by nwhitehorn):
Move NVME controller shutdown from being called as part of module unloading
to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that
the NVME controller gets shut down before the device and bus disappear
and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs.
mav [Thu, 1 Feb 2018 16:40:37 +0000 (16:40 +0000)]
MFC r322036 (by imp):
Make nvd vs nda choice boot-time rather than build-time
Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and
nda to be installed in the kernel, while allowing only one of them to
create devices. This is an all-or-nothing setting, and you can't
change it after boot-time. However, it will allow easier A/B testing.
mav [Thu, 1 Feb 2018 16:35:40 +0000 (16:35 +0000)]
MFC r320984 (by imp):
This adds CAM pass(4) support for NVMe IO's. Applications indicate
the IO type (Admin or NVM) using XPT op-codes XPT_NVME_ADMIN or
XPT_NVME_IO.
mav [Thu, 1 Feb 2018 16:27:10 +0000 (16:27 +0000)]
MFC r314889 (by imp):
Avoid dereferencing unintialized elements in the error path.
Some drives sometimes have errors for things like setting the number
of queue entries in the submission queue. The error paths taken for
these drives ensure a panic dereferencing uninialized data.
mav [Thu, 1 Feb 2018 16:26:35 +0000 (16:26 +0000)]
MFC r314884 (by imp): Make multi-namespace nvme drives more robust.
Fix assumptions about name spaces in NVME driver. First, it assumes
cdata.nn is the number of configured devices. However, it is the
number of supported name spaces. Second, it assumes that there will
never be more than 16 name spaces supported, but a certain drive I'm
testing reports 1024. It assumes that name spaces are a tightly packed
namespace, but the standard seems to indicate otherwise. Finally, it
assumes that an error would be generated when quearying an
unconfigured namespace. Instead, it succeeds but the identify data is
all zeros.
Fix these by limiting the number of name spaces we probe to 16. Remove
aborting when we find one in error. When the size of the name space is
zero, ignore it.
This is admittedly a bandaide. The long term fix will be to
participate in the enumeration and name space change protocols
definfed in the NVNe standard.
mav [Thu, 1 Feb 2018 16:24:03 +0000 (16:24 +0000)]
MFC r313113 (by imp):
Ensure that the passthrough request will fit in MAXPHYS bytes after it
has been rounded to full pages. This avoids a panic in
vm_fault_quick_hold_pages due to this off-by-one error passing one
page too many into vmapbuf.
mav [Thu, 1 Feb 2018 16:22:28 +0000 (16:22 +0000)]
MFC r308855 (by imp):
Implement HGST Log page 0xc1, as documented in the HGST SN100 and
SN150 product manuals. Subpage 0x32 is documented, but not implemented.
mav [Thu, 1 Feb 2018 16:13:28 +0000 (16:13 +0000)]
MFC r308850 (by imp):
Print numbers instead of hex values for smart data. The full 128-bit
number is printed, even though you'd need like a billion IOPs for a 10
billion seconds to overflow the 64-bit counters (~300 years).
mav [Thu, 1 Feb 2018 15:46:19 +0000 (15:46 +0000)]
MFC r308431 (by scottl):
Convert the Q-Pair and PRP list memory allocations to use BUSDMA. Add a
bunch of safery belts and error handling in related codepaths.