hrs [Fri, 2 Aug 2013 03:46:45 +0000 (03:46 +0000)]
MFC 253751 and 253843:
- Relax the restriction on the member interfaces with LLAs. Two or more
LLAs on the member interfaces are actually harmless when the parent
interface does not have a LLA.
- Add net.link.bridge.allow_llz_overlap. This is a knob to allow LLAs on
a bridge and the member interfaces at the same time. The default is 0.
mav [Thu, 1 Aug 2013 09:42:17 +0000 (09:42 +0000)]
MFC r253754:
Partially close race between calls of orphan() method from GEOM and close()
method from ZFS core, that reliably causes use-after-free panic if SSD vdev
detached during inititial erase.
marius [Wed, 31 Jul 2013 11:36:20 +0000 (11:36 +0000)]
Revert r249530 and re-enable compilation of ctl.ko for all configurations
except i386 XEN, for which it still doesn't build so far.
This is a direct commit to stable/9.
+ Add "-f" to also output filemon(4) information.
+ Add d, p and r switches for recording script sessions with timing data
and playing sessions back with or without time delays.
+ Remove contractions.
MFC r253554:
Fix a panic in the racct code when munlock(2) is called with incorrect values.
The racct code in sys_munlock() assumed that the boundaries provided by
the userland were correct as long as vm_map_unwire() returned
successfully. However the latter contains its own logic and sometimes
manages to do something out of those boundaries, even if they are buggy.
This change makes the racct code to use the accounting done by the vm
layer, as it is done in other places such as vm_mlock().
Despite fixing the panic, Alan Cox pointed that this code is still
race-y though: two simultaneous callers will produce incorrect values.
Reviewed by: alc
MFC r253556:
Fix previous commit when option RACCT is not used.
marius [Tue, 30 Jul 2013 10:37:11 +0000 (10:37 +0000)]
MFC: 253676
- Once we have shifted arguments thrice, base-bits-dir is $1 rather than $4.
Introduce $BASEBITSDIR for clarity and in order to avoid repeating this
mistake in the future. Fixing this ensures that we pick up the newly built
boot code and loader native to the target, which is especially relevant
when cross-building release images.
- It is pointless to specify an endianess for ISO 9660 images so strip that.
marius [Tue, 30 Jul 2013 10:24:09 +0000 (10:24 +0000)]
MFC: r253675
Ensure that makefs.h is included when using ufs_bswap.h so the FFS_EI macro
is picked up when defined. Previously, ffs_subr.c was always built without
support for opposite endianess as it doesn't include makefs.h on its own.
marius [Tue, 30 Jul 2013 10:22:08 +0000 (10:22 +0000)]
MFC: r253707
- Set the System Identifier in the Primary Volume Descriptor to FreeBSD
rather than NetBSD.
- Correctly set the Expiration Time in the Primary Volume Descriptor;
according to ISO 9660 8.4.26.1 unspecified date and time are denoted
by the digit 0 in RBP 1 to 16 but the number 0 in RBP 17. [1]
- Merge iso9660_rrip.c rev. 1.11 from NetBSD: name_len should be read
as unsigned byte. [2]
Note: This is according to ISO 9660 9.1.10.
- Rock Ridge TF entries should use a length of 5, because after the 4
bytes of generic SUSP header there is one byte of flags. See typedef
of ISO_RRIP_TF in iso9660_rrip.h. [1]
Submitted by: Thomas Schmitt [1]
Approved by: re (kib)
Obtained from: NetBSD [2]
Fix multiple kernel panics when VIMAGE is enabled in the kernel.
These fixes are based on patches submitted by Adrian Chadd and Marko Zec.
(1) Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling
device_attach(). This fixes multiple VIMAGE related kernel panics
when trying to attach Bluetooth or USB Ethernet devices because
curthread->td_vnet is NULL.
(2) Set curthread->td_vnet in if_detach(). This fixes kernel panics when detaching networking
interfaces, especially USB Ethernet devices.
(3) Use VNET_DOMAIN_SET() in ng_btsocket.c
(4) In ng_unref_node() set curthread->td_vnet. This fixes kernel panics
when detaching Netgraph nodes.
MFC r253404:
o TxD ring requires 8 bytes alignment to work so change alignment
constraint to 8. Previously it may have triggered watchdog
timeouts.
o Check whether interrupt is ours or not.
o Enable interrupts before attemping to transmit queued packets.
This will slightly improve TX performance.
o No need to clear IFF_DRV_OACTIVE in a loop. AE_FLAG_TXAVAIL is
used to know whether there are enough available TxD ring space.
o Added missing bus_dmamap_sync(9) in ae_rx_intr() and rearranged
code to avoid unncessary register access.
o Make sure to clear TxD, TxS, RxD rings in driver initialization.
Otherwise some data in these rings could be interpreted as
'updated' which in turn will advance internally maintained
pointers and can trigger watchdog timeouts.
There is a bug in the ACPICA version 20110527 that is used in stable/9.
The bug can lead to unprotected reference counting on ACPI objects
and eventually to a crash or a memory corruption. The bug has been fixed
upstream and imported to head as of ACPICA version 20130328.
Unfortunately, ACPICA version in stable has not been updated,
so merging all past ACPICA versions or cherry-picking parts of 20130328
would be a big change with a risk of potential regressions.
During debugging it was determined that the most probable vector for the
bug was through concurrent calls to ACPI battery sysctls and ioctls.
The sysctls are already guarded by Giant (not MPSAFE), but ioctls could
execute in parallel to a sysctl call and to each other. All the calls
go through acpi_battery_ioctl function, which makes the actual calls
into ACPICA and those are the calls that lack necessary protection.
Thus preventing concurrency in acpi_battery_ioctl should prevent the
conditions that triggered the ACPICA bug.
Some additional details can be found in this thread:
http://thread.gmane.org/gmane.os.freebsd.devel.acpi/7707/focus=7774
Tested by: kron24@gmail.com,
David Demelier <demelier.david@gmail.com>
Approved by: re (kib)
Add message when nvd disks are attached and detached.
As part of this commit, add an nvme_strvis() function which borrows
heavily from cam_strvis(). This will allow stripping of
leading/trailing whitespace and also handle unprintable characters
in model/serial numbers. This function goes into a new nvme_util.c
file which is used by both the driver and nvmecontrol.
Fix nvme(4) and nvd(4) to support non 512-byte sector sizes.
Recent testing with QEMU that has variable sector size support for
NVMe uncovered some of these issues. Chatham prototype boards supported
only 512 byte sectors.
Use pause() instead of DELAY() when polling for completion of admin
commands during controller initialization.
DELAY() does not work here during config_intrhook context - we need to
explicitly relinquish the CPU for the admin command completion to
get processed.
Do not throw an error if the user requests to activate the image from
an empty firmware slot, as long as the user has specified a firmware
image to download into the empty firmware slot.
o Add devel/subversion and devel/subversion-static so we provide
a package for source-based users to check out the various trees,
both with and without extra dependencies.
This is a direct commit to stable/9.
Approved by: kib (mentor)
Approved by: re (glebius)
MFC: r253506
The NFSv4 server incorrectly assumed that the high order words of
the attribute bitmap argument would be non-zero. This caused an
interoperability problem for a recent patch to the Linux NFSv4 client.
The Linux folks have changed their patch to avoid this, but this
patch fixes the problem on the server.
Reported and tested by: a.heider@gmail.com (Andre Heider)
Approved by: re (Xin Li)
1) POSIX requires rand(3) return values to be in the [0, RAND_MAX]
range, but ACM formula we use have internal state (and return value) in
the [1, 0x7ffffffe] range, so our RAND_MAX (0x7fffffff) is never reached
because it is off by one, zero is not reached too.
Correct both RAND_MAX and rand(3) return value, shifting last one
to the 0 by 1 subtracted, resulting POSIXed [0, 0x7ffffffd(=new
RAND_MAX)] range.
2) Add a checks for not overflowing on too big seeds. It may happens on
the machines, where sizeof(unsigned int) > 32 bits.
This change is binary compatible because range is reduced, not expanded,
so no bump is needed.
In this GRN, Marcel Moolenaar overhauled the logic for mounting
the root file system on bootup:
|------------------------------------------------------------------------
|r214006 | marcel | 2010-10-17 22:01:53 -0700 (Sun, 17 Oct 2010) | 20 lines
|
| Re-implement the root mount logic using a recursive approach, whereby each
|root file system (starting with devfs and a synthesized configuration) can
|contain directives for mounting another file system as root.
|------------------------------------------------------------------------
This commit adds a mount.conf(8) man page which documents
the root mount logic. mount.conf(8) also provides some examples
for the /.mount.conf file, which can be used to change the root mount behavior.
MFC r253280:
Only copy as much bytes as there in superblock, instead of the full
block copy, when copying the superblock into the snapshot. UFS1 does
not align superblock on the block boundary, and bcopy runs off the end
of the buffer.
MFC 252576:
Don't perform the acpi_DeviceIsPresent() check for PCI-PCI bridges. If
we are probing a PCI-PCI bridge it is because we found one by enumerating
the devices on a PCI bus, so the bridge is definitely present. A few
BIOSes report incorrect status (_STA) for some bridges that claimed they
were not present when in fact they were.
While here, move this check earlier for Host-PCI bridges so attach fails
before doing any work that needs to be torn down.
- Configurations in ipv6_prefix_IF should be recognized even if there is no
ifconfig_IF_ipv6.
- DAD wait should be performed at once, not on a per-interface basis, if
possible. This fixes an issue that a system with a lot of IPv6-capable
interfaces takes too long for booting.
Correct the Intel network driver module builds. They were not
defining INET or INET6, and in the case of ixgbe this will cause
a panic in the TSO setup code, but in all cases the ioctl behavior
is different, this change makes the module and static consistent.
When fiddling with options of which registers to copy out for
a mailbox command and which registers to copy back in when
the command completes, the bits being set need to not only
specify what bits you want to add from the default from the
table but also what bits you want *subtract* (mask) from the
default from the table.
Fix a problem with READ ELEMENT STATUS that occurs on some
changers that don't support the DVCID and CURDATA bits that were
introduced in the SMC spec.
These changers will return an Illegal Request type error if the
bits are set. This causes "chio status" to fail.
The fix is two-fold. First, for changers that claim to be SCSI-2
or older, don't set the DVCID and CURDATA bits for READ ELEMENT
STATUS. For newer changers (SCSI-3 and newer), we default to
setting the new bits, but back off and try the READ ELEMENT STATUS
without the bits if we get an Illegal Request type error.
This has been tested on a Qualstar TLS-8211, which is a SCSI-2
changer that does not support the new bits, and a Spectra T-380,
which is a SCSI-3 changer that does support the new bits. In the
absence of a SCSI-3 changer that does not support the bits, I
tested that with some error injection code. (The SMC spec says
that support for CURDATA is mandatory, and DVCID is optional.)
scsi_ch.c: Add a new quirk, CH_Q_NO_DVCID that gets set for
SCSI-2 and older libraries, or newer libraries that
report errors when the DVCID/CURDATA bits are set.
In chgetelemstatus(), use the new quirk to
determine whether or not to set DVCID and CURDATA.
If we get an error with the bits set, back off and
try without the bits. Set the quirk flag if the
read element status succeeds without the bits set.
Increase the READ ELEMENT STATUS timeout to 60
seconds after testing with a Spectra T-380. The
previous value was 10 seconds, and too short for
the T-380. This may be decreased later after
some additional testing and investigation.
Tested by: Andre Albsmeier <Andre.Albsmeier@siemens.com>
Sponsored by: Spectra Logic
MFC r253224:
Fix bug in deleting files: If two ports had the same tarball and one of
them changed (or was removed from the tree) then portsnap would delete
that file. This happened earlier today when one of two empty port
directories was removed. Uniquifying the lists of needed files fixes
this.
Send per-namespace logpage commands to the controller devnode, so they
are processed as admin commands, not I/O commands.
As part of this change, pull out the code for parsing a namespace node
string into a separate function, since it is used for both identify and
logpage commands.
Try to read firmware image before prompting the user to confirm
firmware download. This correctly prints an error and exits for
an incorrect firmware image name before prompting the user to
confirm the download.
r253109:
Incorporate feedback from bde@ based on r252672 changes:
* Use 0/1 instead of sysexits. Man pages are confusing on this topic,
but 0/1 is sufficient for nvmecontrol.
* Use err function family where possible instead of fprintf/exit.
* Fix some typing errors.
* Clean up some error message inconsistencies.
r253279:
%d should be used for printing int32_t instead of %zd.
clang does not complain about this - only gcc.
Incorporated r253279 prior to MFC timeout because it is required for gcc
builds.
In r227207, to fix the issue with possible NULL inp_socket pointer
dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR
for multicast), INP_REUSEPORT flag was introduced to cache the socket
option. It was decided then that one flag would be enough to cache
both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR
setsockopt(2), it was checked if it was called for a multicast address
and INP_REUSEPORT was set accordingly.
Unfortunately that approach does not work when setsockopt(2) is called
before binding to a multicast address: the multicast check fails and
INP_REUSEPORT is not set.
Fix this by adding INP_REUSEADDR flag to unconditionally cache
SO_REUSEADDR.
PR: 179901
Submitted by: Michael Gmelin freebsd grem.de (initial version)
Reviewed by: rwatson
Approved by: re (kib)
marius [Fri, 12 Jul 2013 18:02:10 +0000 (18:02 +0000)]
MFC: r240981, r240990, r240992, r244695
Add 32-bit ABI compat shims. Those are necessary for i386 binary-only
tools like sysutils/hpacucli (HP P4xx RAID controller management
suite) working on amd64 systems.
marius [Fri, 12 Jul 2013 16:41:58 +0000 (16:41 +0000)]
MFC: r253120
- As it turns out, not only MSI-X is broken for devices passed through by
VMware up to at least ESXi 5.1. Actually, using INTx in that case instead
may still result in interrupt storms, with MSI being the only working
option in some configurations. So introduce a PCI_QUIRK_DISABLE_MSIX quirk
which only blacklists MSI-X but not also MSI and use it for the VMware
PCI-PCI-bridges. Note that, currently, we still assume that if MSI doesn't
work, MSI-X won't work either - but that's part of the internal logic and
not guaranteed as part of the API contract. While at it, add and employ
a pci_has_quirk() helper.
Reported and tested by: Paul Bucher
- Use NULL instead of 0 for pointers.
Submitted by: jhb (mostly)
Approved by: re (hrs), jhb
MFC r251282:
When auto-sizing the buffer cache, limit the amount of physical memory
used as the estimation of size, to 16GB. This provides around 100K of
buffer headers and corresponding KVA for buffer map at the peak.
Sizing the cache larger is not useful, also resulting in the wasting
and exhausting of KVA for large machines.
MFC note: the commit message was adjusted to match the code change, the
sizing cap is for 16GB, as noted by delphij.
MFC r245066
-------------------------------------------------------------------------
Teach the kernel to recognize that it is executing inside a bhyve virtual
machine.
-------------------------------------------------------------------------
This will help a 9.2 guest to run more effectively as a bhyve guest.
- Allow to configure net.inet6.ip6.{accept_rtadv,no_radr} by the
loader tunables as well because they have to be configured before
interface initialization for AF_INET6.
- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
To configure an autoconfigured link-local address (RFC 4862), the
following rc.conf(5) configuration can be used:
ifconfig_bridge0_ipv6="inet6 auto_linklocal"
- if_bridge(4) now removes IPv6 addresses on a member interface to be
added when the parent interface or one of the existing member
interfaces has an IPv6 address. if_bridge(4) merges each link-local
scope zone which the member interfaces form respectively, so it causes
address scope violation. Removal of the IPv6 addresses prevents it.
- if_lagg(4) now removes IPv6 addresses on a member interfaces
unconditionally.
- Set reasonable flags to non-IPv6-capable interfaces.
At boot time, all of the static routes are installed as before.
The differences are:
- "/etc/rc.d/netif start/stop <if>" now configures static routes
with :<if> if any.
- "/etc/rc.d/routing start/stop <af> <if>" works as well. <af> cannot be
omitted when <if> is specified, but a keyword "any" or "all" can be used
for <af> and <if>.
- ipv6_enable + ipv6_gateway_enable should unset ACCEPT_RTADV by default for
backward compatibility.
- Configurations in ipv6_prefix_IF should be recognized even if there is no
ifconfig_IF_ipv6.
- DAD wait should be performed at once, not on a per-interface basis, if
possible. This fixes an issue that a system with a lot of IPv6-capable
interfaces takes too long for booting.
- Add CIDR notation support like 192.168.1-2.10-16/24 to $ifconfig_IF_aliasN.
This is an extended version of ipv4_addr_IF which supports both IPv4 and
IPv6, and multiple range specifications. To avoid to generate too many
addresses, the maximum number of the generated addresses is currently
limited to 31.
- Add $ifconfig_IF_aliases, which accepts multiple IP aliases in a variable.
- ipv6_prefix_IF now supports !/64 prefix length. In addition to the old
64-bit format (2001:db8:1:1), a full 128-bit format like 2001:db8:1:1::/64
is supported.
- Replace ifconfig command with $IFCONFIG_CMD variable to support
a dry-run mode in the future.
- Remove IP aliases before removing all of IPv4 addresses when doing
"rc.d/netif down".
- Add a DAD wait to network6_getladdr() because it is possible to fail to
configure an EUI64 address when ipv6_prefix_IF is specified.