grog [Mon, 10 Dec 2012 03:11:19 +0000 (03:11 +0000)]
MFC to r242840:
Add y flag and environment variable LS_SAMESORT to specify the same
sorting order for time and name with the -t option. IEEE Std 1003.2
(POSIX.2) mandates that the -t option sort in descending order, and
that if two files have the same timestamp, they should be sorted in
ascending order of their names. The -r flag reverses both of these
sort orders, so they're never the same. This creates significant
problems for sequentially named files stored on FAT file systems,
where it can be impossible to list them in the order in which they
were created.
Add , (comma) option to print file sizes grouped and separated by
thousands using the non-monetary separator returned by localeconv(3),
typically a comma or period.
eadler [Mon, 10 Dec 2012 02:33:17 +0000 (02:33 +0000)]
MFC r243084:
Add support for a -q flag. While here make the custom argument parsing
use getopt instead of hacking on it more. This change also fixes the
method of silencing the compiler warning about gfn being used
uninitialized.
mav [Sat, 8 Dec 2012 07:37:12 +0000 (07:37 +0000)]
MFC r243571:
Fix problem with the Samsung 840 PRO series SSD detection.
The device reports support for SATA Asynchronous Notification in its
IDENTIFY data, but returns error on attempt to enable that feature.
Make SATA XPT of CAM only report these errors, but not fail the device.
eadler [Sat, 8 Dec 2012 00:28:16 +0000 (00:28 +0000)]
MFC r243893:
Remove hack to emulate effective uid and just use the EUID's name in the
first place. I was unaware of this option when originally committing
this change.
I accidently merged this to only one directory last time :(
eadler [Sat, 8 Dec 2012 00:25:51 +0000 (00:25 +0000)]
MFC r243893:
Remove hack to emulate effective uid and just use the EUID's name in the
first place. I was unaware of this option when originally committing
this change.
Note that the I/O systems in FreeBSD and Solaris/Illumos are sufficiently
different that there is not a 1:1 mapping from scripts that work
with one to the other.
This commit includes the fix so that our probes use "kernel"
instead of the Solaris specific "genunix"
kib [Fri, 7 Dec 2012 01:13:07 +0000 (01:13 +0000)]
MFC r243142:
In pget(9), if PGET_NOTWEXIT flag is not specified, also search the
zombie list for the pid. This allows several kern.proc sysctls to
report useful information for zombies.
Hold the allproc_lock around all searches instead of relocking it.
Remove private pfind_locked() from the new nfs client code.
MFC r243528 (by pjd):
Look for zombie process only if we were given process id.
r243680:
cxgbe/tom: Add a flag to indicate that the L2 table entry for an
embryonic connection has been setup and never attempt to abort a tid
before this is done. This fixes a bad race where a listening socket is
closed when the driver is in the middle of step (b) here. The symptom
of this were "ARP miss" errors from the driver followed by tid leaks.
A hardware-offloaded passive open works this way:
a) A SYN "hits" the TCAM entry for a server tid and the chip delivers it
to the queue associated with the server tid (say, queue A). It waits
for a response from the driver telling it what to do.
b) The driver decides it is ok to proceed. It adds the new tid to the
list of embryonic connections associated with the server tid and then
hands off the SYN to the kernel's syncache to make sure that the kernel
okays it too. If it does then the driver provides an L2 table entry,
queue id (say, queue B), etc. and instructs the chip to send the SYN/ACK
response.
c) The chip delivers a status to queue B depending on how the third step
of the 3-way handshake goes. The driver removes the tid from its list
of embryonic connections and either expands the syncache entry or
destroys the tid. In any case all subsequent messages for the new tid
will be delivered to queue B, not queue A. Anything running in queue B
knows that the L2 entry has long been setup and the new flag is of no
interest from here on. If the listener is closed it will deal with
so_comp as normal.
r243681:
cxgbe/tom: Handle the case where the chip falls out of DDP mode by
itself. The hole in the receive sequence space corresponds to the
number of bytes placed directly up to that point.
rwatson [Thu, 6 Dec 2012 11:52:31 +0000 (11:52 +0000)]
Early MFC of portions of r243752 adding an auditdistd user to stable/8
in order to ease future upgrades; the remainder of r243752 is left for
a future MFC of the OpenBSM upgrade:
Merge a number of changes required to hook up OpenBSM 1.2-alpha2's
auditdistd (distributed audit daemon) to the build:
- Manual cross references
- Makefile for auditdistd
- rc.d script, rc.conf entrie
- New group and user for auditdistd; associated aliases, etc.
The audit trail distribution daemon provides reliable,
cryptographically protected (and sandboxed) delivery of audit tails
from live clients to audit server hosts in order to both allow
centralised analysis, and improve resilience in the event of client
compromises: clients are not permitted to change trail contents
after submission.
Submitted by: pjd
Sponsored by: The FreeBSD Foundation (auditdistd)
gjb [Thu, 6 Dec 2012 00:25:42 +0000 (00:25 +0000)]
MFC r242897:
Prevent including .zfs snapshot directories in the src.txz
distribution. This can happen if the src/ tree checkout is
within its own ZFS dataset, and the 'snapdir' ZFS property
is set to 'visible.'
pfg [Wed, 5 Dec 2012 22:47:45 +0000 (22:47 +0000)]
MFC r243641:
Partially bring r242520 to ext2fs.
When a file is first being written, the dynamic block reallocation
(implemented by ext2_reallocblks) relocates the file's blocks
so as to cluster them together into a contiguous set of blocks on
the disk.
When the cluster crosses the boundary into the first indirect block,
the first indirect block is initially allocated in a position
immediately following the last direct block. Block reallocation
would usually destroy locality by moving the indirect block out of
the way to keep the data blocks contiguous.
The issue was diagnosed long ago by Bruce Evans on ffs and surfaced
on ext2fs when block reallocaton was ported. This is only a partial
solution based on the similarities with FFS. We still require more
review of the allocation details that vary in ext2fs.
kib [Tue, 4 Dec 2012 00:57:11 +0000 (00:57 +0000)]
MFC r243342:
Schedule garbage collection run for the in-flight rights passed over
the unix domain sockets to the next tick, coalescing the serial calls
until the collection fires.
kib [Tue, 4 Dec 2012 00:54:49 +0000 (00:54 +0000)]
MFC r243341:
Add a special meaning to the negative ticks argument for
taskqueue_enqueue_timeout(). Do not rearm the callout if it is
already armed and the ticks is negative. Otherwise rearm it to fire
in abs(ticks) ticks in the future.
delphij [Mon, 3 Dec 2012 18:37:02 +0000 (18:37 +0000)]
MFC r242681 (ambrisko):
- Extend the prior commit to use the generic SCSI command building
function use that for JBOD and Thunderbolt disk write command. Now
we only have one implementation in mfi.
- Fix dumping on Thunderbolt cards. Polled IO commands do not seem to
be normally acknowledged by changing cmd_status to MFI_STAT_OK.
In order to get acknowledgement of the IO is complete, the Thunderbolt
command queue needs to be run through. I added a flag MFI_CMD_SCSI
to indicate this command is being polled and to complete the
Thunderbolt wrapper and indicate the result. This flag needs to be
set in the JBOD case in case if that us using Thunderbolt card.
When in the polling loop check for completed commands.
- Remove mfi_tbolt_is_ldio and just do the check when needed.
- Fix an issue when attaching of disk device happens when a device is
already scheduled to be attached but hasn't attached.
- add a tunable to allow raw disk attachment to CAM via:
hw.mfi.allow_cam_disk_passthrough=1
- fixup aborting of commands (AEN and LD state change). Use a generic
abort function and only wait the command being aborted not both.
Thunderbolt cards don't seem to abort commands so the abort times
out.
delphij [Mon, 3 Dec 2012 18:34:54 +0000 (18:34 +0000)]
MFC r242497:
Copy code from scsi_read_write() as mfi_build_syspd_cdb() to build SCSI
command properly. Without this change, mfi(4) always sends 10 byte READ
and WRITE commands, which will cause data corruption when device is
larger than 2^32 sectors.
dim [Sun, 2 Dec 2012 00:31:23 +0000 (00:31 +0000)]
Pull in r158245 from upstream clang:
[C++11 Compat] Fix breaking change in C++11 pair copyctor.
While this code is valid C++98, it is not valid C++11. The problem
can be reduced to:
class MDNode;
class DIType {
operator MDNode*() const {return 0;}
};
class WeakVH {
WeakVH(MDNode*) {}
};
int main() {
DIType di;
std::pair<void*, WeakVH> p(std::make_pair((void*)0, di)));
}
This was not detected by any of the bots we have because they either
compile C++98 with libstdc++ (which allows it), or C++11 with libc++
(which incorrectly allows it). I ran into the problem when compiling
with VS 2012 RC.
Thanks to Richard for explaining the issue.
This fixes building clang 3.1 on stable/9 with libc++ in C++11 mode.
This is a direct commit to stable/9, since there is no separate commit
in head which has just this particular change, and I do not want to do a
full import at this time.
rmacklem [Sat, 1 Dec 2012 01:11:59 +0000 (01:11 +0000)]
MFC: r241568
Add a new '-S' option to mountd, which tells it to suspend
execution of the nfsd threads while it is reloading the exports.
This avoids clients from getting intermittent access errors
when the exports are being reloaded non-atomically.
It is not an ideal solution, since requests will back up while
the nfsd threads are suspended. Also, when this option is used,
if mountd crashes while reloading exports, mountd will have to
be restarted to get the nfsd threads to resume execution.
This has been tested by Vincent Hoffman (vince at unsane.co.uk)
and John Hickey (jh at deterlab.net).
The nfse patch offers a more comprehensive solution for this issue.
rmacklem [Sat, 1 Dec 2012 01:07:51 +0000 (01:07 +0000)]
MFC: r241561
Add two new options to the nfssvc(2) syscall that allow
processes running as root to suspend/resume execution
of the kernel nfsd threads. An earlier version of this
patch was tested by Vincent Hoffman (vince at unsane.co.uk)
and John Hickey (jh at deterlab.net).
mav [Thu, 29 Nov 2012 18:08:36 +0000 (18:08 +0000)]
MFC r242323, r242328:
Add basic BIO_DELETE support to GEOM RAID class for all RAID levels.
If at least one subdisk in the volume supports it, BIO_DELETE requests
will be propagated down. Unfortunatelly, for RAID levels with redundancy
unmapped blocks will be mapped back during first rebuild/resync process.
MFC r236884:
Introduce "feature flags" for ZFS pools (bump SPA version to 5000).
Add first feature "com.delphix:async_destroy" (asynchronous destroy
of ZFS datasets).
Implement features support in ZFS boot code.
Illumos revisions merged:
13700:2889e2596bd6
13701:1949b688d5fb
2619 asynchronous destruction of ZFS file systems
2747 SPA versioning with zfs feature flags
1796 "ZFS HOLD" should not be used when doing "ZFS SEND" froma read-only pool
2871 support for __ZFS_POOL_RESTRICT used by ZFS test suite
2903 zfs destroy -d does not work
2957 zfs destroy -R/r sometimes fails when removing defer-destroyed snapshot
MFC r238926:
Partial MFV (illumos-gate 13753:2aba784c276b)
2762 zpool command should have better support for feature flags
References:
https://www.illumos.org/issues/2762
MFC r238950:
Fix reporting of root pool upgrade notice.
MFC r238951:
Fix wrong indent according to style(9)
MFC r239389:
Backport fix for vendor issue #3085
3085 zfs diff panics, then panics in a loop on booting
References:
https://www.illumos.org/issues/3085
MFC r239394:
Update zfs(8) manpage with illumos version of "zfs diff"
Illumos issue:
2399 zfs manual page does not document use of "zfs diff"
References:
https://www.illumos.org/issues/2399
MFC r239620 [2]:
Merge recent vendor changes:
3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets
3090 vdev_reopen() during reguid causes vdev to be treated as corrupt
3102 vdev_uberblock_load() and vdev_validate() may read the wrong label
MFC r240153 (gjb) [3]:
Typo fix and minor word swap.
MFC r240303:
Add assfail() and assfail3() to the opensolaris module.
Remove obsoleted intermediate cddl/compat/opensolaris/sys/debug.h.
MFC r240345 (avg):
zfs: fix sa_modify_attrs handling of variable-sized attributes
- skip length_idx index for a replaced variable-sized attribute
- skip length_idx index for a removed variable-sized attribute
- also re-arranged code to make sure that length_idx is always
incremented for variable-sized attributes
- additionally add an assertion that the number of actually produced
attributes is the same as the expected number of resulting
attributes
Illumos issued covered:
1884 Empty "used" field for zfs *space commands
3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument
is zero
3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID
3048 zfs {user,group}space [-s|-S] is broken
3049 zfs {user,group}space -t doesn't really filter the results
3060 zfs {user,group}space -H output isn't tab-delimited
3061 zfs {user,group}space -o doesn't use specified fields order
3064 usr/src/cmd/zpool/zpool_main.c misspells "successful"
3093 zfs {user,group}space's -i is noop
3098 zfs userspace/groupspace fail without saying why when run as non-root
MFC r240955 (partial):
Merge recent vendor changes in ZFS.
Illumos issued covered:
3139 zdb dies when it tries to determine path of unlinked file
3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg
3208 moving zpool cross-endian results in incorrect user/group accounting
MFC r241655:
Add missing initialization for do_prefix.
Corrects porting error in r238391
Vendor issue and changeset reference:
2883 changing "canmount" property to "on" should not always remount dataset
https://www.illumos.org/issues/2883
Changeset 13743:95aba6e49b9f
MFC r243014:
Move zpool-features manual page from section 5 to section 7
and fix references
hselasky [Wed, 28 Nov 2012 18:10:05 +0000 (18:10 +0000)]
MFC r242619, r242695, r242702 and r242703:
Implement support for RTS (flow control).
Improve USB serial console support.
Implement a USB serial jitter buffer in receive direction.
ae [Tue, 27 Nov 2012 01:59:51 +0000 (01:59 +0000)]
MFC r242079:
Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.
MFC r242082:
Note the removal of the IPFIREWALL_FORWARD kernel option.
MFC r242463:
Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.
mav [Mon, 26 Nov 2012 15:34:28 +0000 (15:34 +0000)]
MFC r238943:
Add several performance optimizations to acpi_cpu_idle().
For C1 and C2 states use cpu_ticks() to measure sleep time instead of much
slower ACPI timer. We can't do it for C3, as TSC may stop there. But it is
less important there as wake up latency is high any way.
For C1 and C2 states do not check/clear bus mastering activity status, as
it is important only for C3. As side effect it can make CPU enter C2 instead
of C3 if last BM activity was two sleeps back (unlike one before), but
that may be even good because of collecting more statistics. Premature BM
wakeup from C3, entered because of overestimation, can easily be worse then
entering C2 from both performance and power consumption points of view.
Together on dual Xeon E5645 system on sequential 512 bytes read test this
change makes cpu_idle_acpi() as fast as simplest cpu_idle_hlt() and only
few percents slower then cpu_idle_mwait(), while deeper states are still
actively used during idle periods.
To help with diagnostics, add C-state type into dev.cpu.X.cx_supported.
yongari [Mon, 26 Nov 2012 04:39:41 +0000 (04:39 +0000)]
MFC r242426:
TCP/UDP checksum offloading feature for IP fragmented datagram was
removed in r99417. bge(4) controllers can do TCP checksum offload
for IP fragmented datagrams but unlike ti(4), it lacks UDP checksum
offloading for IP fragmented datagrams. The problem was bge(4)
blindly requested TCP/UDP checksum for IP fragmented datagrams such
that it resulted in corrupted UDP datagrams before r99417.
Remove remaining code for TCP checksum offloading for IP fragmented
datagrams which should have been removed in r99417.
yongari [Mon, 26 Nov 2012 04:34:05 +0000 (04:34 +0000)]
MFC r241983-241985:
r241983:
Do not hardcode phy address. Multi-port controllers use different phy
address.
r241984:
Ethernet@WireSpeed is defined for 1000baseT adapter to establish a
link at a lower speed so enabling it for fiber adapters is wrong.
Fix the issue by setting BGE_PHY_NO_WIRESPEED such that brgphy(4)
wouldn't enable the feature.
While I'm here move PHY specific feature/bug configuration to new
location(just before mii attach) for readability.
r241985:
For fast ethernet controllers, Ethernet@WireSpeed is not defined so
explicitly set BGE_PHY_NO_WIRESPEED flag.
yongari [Mon, 26 Nov 2012 04:25:41 +0000 (04:25 +0000)]
MFC r241438:
Add APE firmware support and improve firmware handshake procedure.
This change will enable IPMI access on 5717/5718/5719/5720 and 5761
controllers. Because ASF is not available when APE firmware is
present, bge_allow_asf tunable is ignored when driver detects APE
firmware. Also bge(4) no longer performs two resets(one blind
reset and the other reset with firmware in mind) in device attach.
Now bge(4) performs a reset with enough information in bge_reset().
The APE firmware also needs special handling to make suspend/resume
work but it was not implemented yet.
With this change, bge(4) should work on any 5717/5718/5719/5720
controllers. Special thanks to Mike Hibler at Emulab who setup
remote debugging on Dell R820. Without his help I couldn't be able
to address several issues happened on Dell Rx20 systems. And many
thanks to Broadcom for continuing to support FreeBSD!
Submitted by: davidch (initial version)
H/W donated by: Broadcom
Tested by: many
Tested on: Del R820/R720/R620/R420/R320 and HP Proliant DL 360 G8
yongari [Mon, 26 Nov 2012 04:20:11 +0000 (04:20 +0000)]
MFC r241437:
For 5717C/5719C/5720C and 57765 PHYs, do not perform any special
handling(jumbo, wire speed etc) in brgphy_reset(). Touching
BRGPHY_MII_AUXCTL register seems to confuse APE firmware such that
it couldn't establish a link.
yongari [Mon, 26 Nov 2012 04:10:27 +0000 (04:10 +0000)]
MFC r241436:
Rework controller reset procedure. Previously driver saved
BGE_PCI_PCISTATE register before issuing global reset. After
issuing reset, it reads BGE_PCI_PCISTATE register again and
compares the saved register value and current value. It was used to
know whether the global reset operation was completed or not.
Unfortunately, this logic caused several issues on recent BCM5717/
5718/5719 and BCM5720 controllers. It seems APE firmware accesses
some registers while global reset is in progress such that reading
BGE_PCI_PCISTATE register after reset does not yield old pre-reset
state value. This resulted in consuming too much time in global
reset and sometimes it couldn't successfully complete reset.
The BGE_MISCCFG_RESET_CORE_CLOCKS of BGE_MISC_CFG register is
self-clearing bit so driver is able to know the reset completion.
But the core-lock reset will disable indirect/flat/standard access
modes such that driver cannot poll BGE_MISCCFG_RESET_CORE_CLOCKS
bit of BGE_MISC_CFG register. So just wait enough time for
core-clock reset to complete.
Data sheet says driver should wait 100us for PCI/PCI-X devices and
100ms for PCIe devices. I chose 1ms for PCI/PCI-X since this value
was used for many years in bge(4). For PCIe devices, use 100ms as
recommended by data sheet.
bge_chipinit() also cleared BGE_MAC_MODE register which shall clear
firmware configured mode information. I think this will result in
losing ASF/IPMI link in device attachment. Let bge_reset() honor
firmware configured BGE_MAC_MODE register and don't announce driver
is UP in bge_reset(). Firmware should have control over driver until
it's fully initialized by driver.
While I'm here, enable workaround for PCI-X BCM5704 A0 in
bge_reset(). This will prevent internal arbitration logic from
switching to the other DMA engine after a retry cycle.
yongari [Mon, 26 Nov 2012 02:41:30 +0000 (02:41 +0000)]
MFC r241388-241393:
r241388:
If the maximum payload size is 256 bytes or more, set the DMA write
water mark to 256 bytes. Otherwise controller will encounter DMA
write under run errors and would result in RX DMA hang. If the
maximum payload size is 128 bytes, the water mark is set to 128
bytes as usual.
While here, set maximum read request size to 2048 for BCM5719/BCM5720.
For other PCIe devices, use 4096. And reprogram the maximum read
request size whenever device reset is performed.
r241389:
On PHY write error use hex number to show the value.
Add more comments.
r241390:
Honor PHY type fiber for BCM5717/BCM5718/BCM5719/BCM5720.
r241391:
Do not force PCIe 1.0a mode in device reset on BCM5717 and newer
controllers. BCM5785 does not require PCI 1.0a mode as well during
reset.
r241392:
Fix a long standing VCPU reset sequence bug on BCM5906.
The VCPU(Virtual CPU) of BCM5906 is used to provide a mechanism to
control the bootcode execution and to pick up configuration data
stored inside the EEPROM.
The bootcode of BCM5906 will check the BGE_VCPU_STATUS_DRV_RESET
bit to decide which booting procedure to choose.
Data sheet indicates the VCPU of BCM5906 should set
BGE_VCPU_STATUS_DRV_RESET bit *before* VCPU reset or global reset.
r241393:
Remove unnecessary delay. I don't see any comments in data sheet
that requires 10ms delay after device reset. Because that code was
there from day 1, I guess it was added to give enough settlement
time after updating BGE_MAC_MODE register.
The recommended delay time for BGE_MAC_MODE after updating is 40us
and it was already done in r241219.
nyan [Fri, 23 Nov 2012 15:42:25 +0000 (15:42 +0000)]
MFC: r240855, r242867, r242868, r242869
MFi386: r237445
Commit changes missed from r237435. Properly calculate the signal
trampoline addresses after the shared page is enabled. Handle FreeBSD
ABIs without shared page support too.
MFi386: r238792
Introduce curpcb magic variable.
MFi386: r211924
Register an interrupt vector for DTrace return probes.
Fix some KASSERTs.
They are missing changes from r208833, r227394 and r227442.
glebius [Fri, 23 Nov 2012 11:23:15 +0000 (11:23 +0000)]
Merge r242467 from head:
- If DRM_DEBUG_DEFAULT_ON is defined, then initialize drm_debug_flagi to
all supported debugging bits.
- If DRM_DEBUG_DEFAULT_ON isn't defined, then initialize drm_debug_flag
to zero.
DRM_DEBUG_DEFAULT_ON is defined when module is build with -DDEBUG_DRM
or if kernel config has 'options DEBUG_DRM'.
glebius [Fri, 23 Nov 2012 11:19:43 +0000 (11:19 +0000)]
Merge r241037 from head:
The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.
o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.
o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.
o Drivers handle their stats theirselves: if_obytes, if_omcasts.
o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.
o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.
o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.
o mxge(4) just no longer needs NO_SLOW_STATS define.
o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.