Alexander Motin [Tue, 8 Jul 2014 12:15:15 +0000 (12:15 +0000)]
Return task management requests to queued execution, but differently.
Testing shown that both original queued design with separate task queue,
and recent direct execution design had significant flaw: If abort request
arrives just after the victim, the last one may not be in the ooa_queue
yet, and so invisible for the task management function.
Unlike original queued implementation, use same queue for all SCSI and
TASK requests from the same initiator. That avoids races between them:
task functions are always executed in proper time, relatively to other
requests.
Correct the problem reported by test16 from
tools/regression/file/flock/flock.c, which completes the fix in
r192685. When the lock was stolen from us, retry the whole lock
sequence in kernel, instead of returning EINTR to usermode and hoping
that application would handle it correctly by restarting the lock
acquire.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Warner Losh [Mon, 7 Jul 2014 23:21:25 +0000 (23:21 +0000)]
xdev builds libsupc++ and libstdc++ in a slightly strange way. This
cause a race to be exposed between the two. Compensate for this race
by serializing the build/install of libstdc++ before libsupc++.
Warner Losh [Mon, 7 Jul 2014 23:21:20 +0000 (23:21 +0000)]
rm -rf can fail sometimes with an error from fts_read. Make it honor
fflag to ignore fts_read errors, but stop deleting from that directory
because no further progress can be made.
When building a kernel with a high -j value on a high core count
machine, during the cleanobj phase we can wind up doing multiple rm
-rf at the same time for modules that have subdirectories. This
exposed this race (sometimes) as fts_read can return an error if the
directory is removed by another rm -rf. Since the intent of the -f
flag was to ignore errors, even if this was a bug in fts_read, we
should ignore the error like we've been instructed to do.
Warner Losh [Mon, 7 Jul 2014 23:21:07 +0000 (23:21 +0000)]
Naughty NANDFS was using hidden unused flag, hiding the fact that the
flag was used and wasn't really available. Change the name without
fixing any laying issues that might be present in NANDFS' use of this
flag.
Alexander Motin [Mon, 7 Jul 2014 09:37:22 +0000 (09:37 +0000)]
Teach ctl_add_initiator() to dynamically allocate IIDs from pool.
If port passed negative IID value, the function will try to allocate IID
from the pool of unused, based on passed wwpn or name arguments. It does
all its best to make IID unique and persistent across reconnects.
This makes persistent reservation properly work for iSCSI. Previously,
in case of reconnects, reservation could be unexpectedly lost, or even
migrate between intiators.
Fabien Thomas [Mon, 7 Jul 2014 08:22:39 +0000 (08:22 +0000)]
Optim and Fix for mge driver:
- add missing rcvif in mbuf
- add missing ipacket stat
- remove uncessary mbuf copy on output path
- fix deadlock of the TX engine in case of error
Alexander Motin [Mon, 7 Jul 2014 05:48:11 +0000 (05:48 +0000)]
When new connection comes in, check whether we already have session from
the same intiator (Name+ISID). If so -- terminate the old session and let
the new one take its place, as required by iSCSI RFC.
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Alan Cox [Sun, 6 Jul 2014 17:42:38 +0000 (17:42 +0000)]
Introduce pmap_unwire(). It will replace pmap_change_wiring(). There are
several reasons for this change:
pmap_change_wiring() has never (in my memory) been used to set the wired
attribute on a virtual page. We have always used pmap_enter() to do that.
Moreover, it is not really safe to use pmap_change_wiring() to set the wired
attribute on a virtual page. The description of pmap_change_wiring() says
that it assumes the existence of a mapping in the pmap. However, non-wired
mappings may be reclaimed by the pmap at any time. (See pmap_collect().)
Many implementations of pmap_change_wiring() will crash if the mapping does
not exist.
pmap_unwire() accepts a range of virtual addresses, whereas
pmap_change_wiring() acts upon a single virtual page. Since we are
typically unwiring a range of virtual addresses, pmap_unwire() will be more
efficient. Moreover, pmap_unwire() allows us to unwire superpage mappings.
Previously, we were forced to demote the superpage mapping, because
pmap_change_wiring() only allowed us to express the unwiring of a single
base page mapping at a time. This added to the overhead of unwiring for
large ranges of addresses, including the implicit unwiring that occurs at
process termination.
Alexander Motin [Sun, 6 Jul 2014 17:37:49 +0000 (17:37 +0000)]
Make iSCSI initiator keep Initiator Session ID (ISID) across reconnects.
Previously ISID was changed every time, that made impossible correct
persistent reservation, because reconnected session was identified as
completely new one.
Fix OFED startup order: All SYSINIT()'s and modules should be loaded
prior to starting "/sbin/init" which will run all the "/etc/rc.d/xxx"
scripts. Else there can be a race configuring the interfaces via
"/etc/rc.conf".
Andrew Turner [Sun, 6 Jul 2014 10:24:06 +0000 (10:24 +0000)]
Align the stack in _rtld_bind_start. Normally this is called with the
correct stack alignment, however when we have a leaf function that uses
thread local storage it calls __aeabi_read_tp to get the thread pointer.
Neither GCC or clang see this as a function call so will align the stack
to a 4-byte boundary. This may be a problem as _rtld_bind expects to be
on an 8-byte boundary.
The solution is to store a copy of the stack pointer and force the
alignment before calling _rtld_bind.
This fixes a problem with armeb where applications would crash in odd ways.
It should also remove the need for a local patch to clang to force the
stack alignment to an 8-byte boundary, even for leaf functions. Further
testing will be needed before reverting this local change to clang as we
may rely on it in other places.
Alexander Motin [Sat, 5 Jul 2014 19:30:20 +0000 (19:30 +0000)]
Burry devid port method, which was a gross hack.
Instead make ports provide wanted port and target IDs, and LUNs provide
wanted LUN IDs. After that core Device ID VPD code only had to link all
of them together and add relative port and port group numbers.
LUN ID for iSCSI LUNs no longer created by CTL, but by ctld, and passed
to CTL as "scsiname" LUN option. This makes LUNs to report the same set
of IDs, independently from the port through which it is accessed, as
required by SCSI specifications.
Alexander Motin [Sat, 5 Jul 2014 18:15:00 +0000 (18:15 +0000)]
Create separate CTL port for every iSCSI target (and maybe portal group).
Having single port for all iSCSI connections makes problematic implementing
some more advanced SCSI functionality in CTL, that require proper ports
enumeration and identification.
This change extends CTL iSCSI API, making ctld daemon to control list of
iSCSI ports in CTL. When new target is defined in config fine, ctld will
create respective port in CTL. When target is removed -- port will be
also removed after all active commands through that port properly aborted.
This change require ctld to be rebuilt to match the kernel.
As a minor side effect, this allows to have iSCSI targets without LUNs.
While that may look odd and not very useful, that is not incorrect.
6679140 asymmetric alloc/dealloc activity can induce dynamic variable drops 6679193 dtrace_dynvar walker produces flood of dtrace_dynhash_sink
This finishes a set of merges from the older OpenSolaris releases.
Still the FreeBSD port has many differences that are difficult to
account for but that seems normal given that the kernels are different.
When getting the initial value of numeric tunables use the
getenv_xxx() functions instead of strtoq(), because the getenv_xxx()
functions include wrappers for various postfixes like G/M/K, which
strtoq() doesn't do.
units(1): Convert units.lib to use '#' instead of '/'.
This allows us to run GNU units against our data files and compare the output.
In addition, current units(1) does not support '/' as a comment at all.
units: Support start of line comments with '#'
Modern GNU units(1) supports comments anywhere with '#' but take the easy route for now and at least support start of line # comments.
Peter Grehan [Sat, 5 Jul 2014 02:38:53 +0000 (02:38 +0000)]
Extend capabilities to 64-bits in preparation for some API changes.
The v1.0 virtio spec supports an extended size for guest/host
caps, but in practice 64-bits should last for a long time.
Rick Macklem [Fri, 4 Jul 2014 22:47:07 +0000 (22:47 +0000)]
The new NFSv3 server did not generate directory postop attributes for
the reply to ReaddirPlus when the server failed within the loop
that calls VFS_VGET(). This failure is most likely an error
return from VFS_VGET() caused by a bogus d_fileno that was
truncated to 32bits.
This patch fixes the server so that it will return directory postop
attributes for the failure. It does not fix the underlying issue caused
by d_fileno being uint32_t when a file system like ZFS generates
a fileno that is greater than 32bits.
Alexander Motin [Fri, 4 Jul 2014 19:27:06 +0000 (19:27 +0000)]
Separate concepts of frontend and port.
Before iSCSI implementation CTL had no knowledge about frontend drivers,
it had only frontends, which really were ports (alike to LUNs, if comparing
to backends). But iSCSI added there ioctl() method, which does not belong
to frontend as a port, but belongs to a frontend driver.
After EFI support was added to the installer, it needed to allow boot
partitions of types other than "freebsd-boot" (in particular, "efi").
This allows the removal of some nasty hacks for supporting PowerPC systems,
in particular aliasing freebsd-boot to apple-boot on APM and an IBM-specific
code on MBR.
This changes the installer to use the correct names, which also breaks a
degeneracy in the meaning of "freebsd-boot" that allows the addition
of support for some newer IBM systems that can boot from GPT in addition to
MBR. Since I have no idea how to detect which those systems are, leave
the default on IBM PPC systems as MBR for now.
BREAK_TO_DEBUGGER is not just serial console anymore, it controls all
console's ability to enter the debugger.... rwatson forgot to document
this when he changed it back in 2011... There is more docs to write
about this, but at least fix this for now...
Adopt to current and other changes:
use dedicated kernel files with some local settings
use mkimg for ISO building
put images into separate directory and rename them for better consistency
Add persistent reservation support to camcontrol(8).
camcontrol(8) now supports a new 'persist' subcommand that allows users to
issue SCSI PERSISTENT RESERVE IN / OUT commands.
sbin/camcontrol/Makefile:
Add persist.c.
sbin/camcontrol/persist.c:
New persistent reservation support for camcontrol(8).
We have support for all known operation modes for PERSISTENT RESERVE
IN and PERSISTENT RESERVE OUT.
exceptions noted above.
sbin/camcontrol/camcontrol.8:
Document the new 'persist' subcommand.
In the section on the Transport ID (-I) option, explain what
Transport IDs for each protocol should look like. At some point
some of this information could probably get moved off in a
separate man page, either on Transport IDs alone or a man page
documenting the Transport ID parsing code.
Add a number of examples of persistent reservation commands.
Persistent Reservations are complex enough that the average user
probably won't be able to get the commands exactly right by just
reading the man page. These examples show a few basic and
advanced examples of how to use persistent reservations.
sbin/camcontrol/camcontrol.h:
Move the definition for camcontrol_optret here, so we can use it
for the persistent reservation code.
Add a definition for the new scsipersist() function.
sbin/camcontrol/camcontrol.c:
Add 'persist' to the list of subcommands.
Document 'persist' in the help text.
sys/cam/scsi/scsi_all.c:
Add the scsi_persistent_reserve_in() and
scsi_persistent_reserve_out() CCB building functions.
Add a new function, scsi_transportid_sbuf(). This takes a
SCSI Transport ID (documented in SPC-4), and prints it to
an sbuf(9). There are some transports (like ATA, USB, and
SSA) for which there is no transport defined. We need to
come up with a reasonable thing to do if we're presented
with a Transport ID that claims to be for one of those
protocols.
Add new routines scsi_get_nv() and scsi_nv_to_str().
These functions do a table lookup to go between a string and an
integer. There are lots of table lookups needed in the
persistent reservation code in camcontrol(8).
Add a new function, scsi_parse_transportid(), along with leaf node
functions to parse:
FC, 1394 and SAS (scsi_parse_transportid_64bit())
iSCSI (scsi_parse_transportid_iscsi())
SPI (scsi_parse_transportid_spi())
RDMA (scsi_parse_transportid_rdma())
PCIe (scsi_parse_transportid_sop())
Transport IDs. Given a string with the general form proto,id these
functions create a SCSI Transport ID structure.
sys/cam/scsi/scsi_all.h:
Update the various persistent reservation data structures to
SPC4r36l, but also rename some fields that were previously
obsolete with the proper names from older SCSI specs. This
allows using older, obsolete persistent reservation types when
desired.
Add function prototypes for the new persistent reservation CCB
building functions.
Add a data strucure for the READ FULL STATUS service action
of the PERSISTENT RESERVE IN command.
Add Transport ID structures for all protocols described in SPC-4.
Add a new series of SCSI_PROTO_XXX definitions, and
redefine other defines in terms of these new definitions.
Add a prototype for scsi_transportid_sbuf().
Change a couple of "obsolete" persistent reservation data
structure fields into something more meaningful, based on
what the field was called when it was defined in the spec.
(e.g. SPC, SPC-2, etc.)
Create a new define, SPRI_MAX_LEN, for the maximum allocation
length allowed for the PERSISTENT RESERVE IN command.
Add data structures and enumerations for the new name/value
translation functions.
Add data structures for SCSI over PCIe Routing IDs.
Bring the PERSISTENT RESERVE OUT Register and Move parameter list
structure (struct scsi_per_res_out_parms) up to date with SPC-4.
Add a data structure for the transport IDs that can optionally be
appended to the basic PERSISTENT RESERVE OUT parameter list.
Move SCSI protocol macro definitions out of the VPD page 0x83
definition and combine them with the more up to date protocol
definitions higher in the file.
Add function prototypes for scsi_nv_to_str(), scsi_get_nv(),
scsi_parse_transportid_64bit(), scsi_parse_transportid_spi(),
scsi_parse_transportid_rdma(), scsi_parse_transportid_iscsi(),
scsi_parse_transportid_sop(), and scsi_parse_transportid().
Add VHD support to mkimg(1). VHD is used by Xen and Microsoft's Hyper-V
among others.
Add an undocumented option for unit testing (-y). When given, the image
will have UUIDs and timestamps synthesized in a way that gives identical
results across runs. As such, UUIDs stop being unique, globally or
otherwise.
Properly advertise that if_arge can handle long frames (if_arge is set to
handle packets up to 1536 bytes)
This fixes the need to frag that could happen when using vlans on top of
if_arge (which is a common case for the use the switch ports as individual
NICs).
Previously to this commit any vlan setup with if_arge as parent would have
the MTU of the parent interface reduced by the size of dot1q header
(4 bytes).
Tested on TP-Link 1043ND (where the WAN port is just a switch port setup to
tag packets in a different VLAN than the LAN ports).
Reported and tested by: Harm Weites (harm at weites.com)
The u-boot tarball needed for some boards, BEAGLEBONE for
example, explicitly hard-code gcc(1) as the compiler.
Partially revert r264703, which did a post-chroot install
of gcc(1). This was initially removed because gcc(1) fails
to build usr.bin/dtc/ causing the xdev target to fail. So
this time, move the gcc(1) installation after xdev is built.
This change is likely applicable to stable/10 arm build
failures, as well.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation