jilles [Mon, 17 Dec 2012 12:26:10 +0000 (12:26 +0000)]
MFC r239150: nftw(): Do not check the maxfds argument against OPEN_MAX.
Apart from the fact that nothing should have OPEN_MAX as a limit (as opposed
to RLIMIT_NOFILE from getrlimit() or _SC_OPEN_MAX from sysconf()), POSIX
does not require us to check this.
delphij [Mon, 17 Dec 2012 06:35:15 +0000 (06:35 +0000)]
Note that the manual page of less(1) says:
Note that a preprocessor cannot output an empty file, since that
is interpreted as meaning there is no replacement, and the origi-
nal file is used. To avoid this, if LESSOPEN starts with two ver-
tical bars, the exit status of the script becomes meaningful. If
the exit status is zero, the output is considered to be replace-
ment text, even if it empty. If the exit status is nonzero, any
output is ignored and the original file is used. For compatibil-
ity with previous versions of less, if LESSOPEN starts with only
one vertical bar, the exit status of the preprocessor is ignored.
Use two pipe symbols for zless, so that zless'ing a compressed empty
file will give output rather than being interpreted as its compressed
form, which is typically a binary.
Thanks Mark Nudelman for pointing out this difference and the
suggested solution.
Remove redundant call to AUDIT_ARG_UPATH1().
Path will be remembered by the following NDINIT(AUDITVNODE1) call.
Sponsored by: The FreeBSD Foundation (auditdistd)
r243720:
IFp4 @208381:
For VOP_GETATTR() we just need vnode to be shared-locked.
Sponsored by: The FreeBSD Foundation (auditdistd)
r243722:
IFp4 @208382:
Currently on each record write we call VFS_STATFS() to get available space
on the file system as well as VOP_GETATTR() to get trail file size.
We can assume that trail file is only updated by the audit worker, so instead
of asking for file size on every write, get file size on trail switch only
(it should be zero, but it's not expensive) and use global variable audit_size
protected by the audit worker lock to keep track of trail file's size.
This eliminates VOP_GETATTR() call for every write. VFS_STATFS() is satisfied
from in-memory data (mount->mnt_stat), so shouldn't be expensive.
Sponsored by: The FreeBSD Foundation (auditdistd)
r243723:
IFp4 @208383:
Currently when we discover that trail file is greater than configured
limit we send AUDIT_TRIGGER_ROTATE_KERNEL trigger to the auditd daemon
once. If for some reason auditd didn't rotate trail file it will never
be rotated.
Change it by sending the trigger when trail file size grows by the
configured limit. For example if the limit is 1MB, we will send trigger
on 1MB, 2MB, 3MB, etc.
This is also needed for the auditd change that will be committed soon
where auditd may ignore the trigger - it might be ignored if kernel
requests the trail file to be rotated too quickly (often than once a second)
which would result in overwriting previous trail file.
Sponsored by: The FreeBSD Foundation (auditdistd)
r243726:
IFp4 @208451:
Fix path handling for *at() syscalls.
Before the change directory descriptor was totally ignored,
so the relative path argument was appended to current working
directory path and not to the path provided by descriptor, thus
wrong paths were stored in audit logs.
Now that we use directory descriptor in vfs_lookup, move
AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where
we hold file descriptors table lock, so we are sure paths will
be resolved according to the same directory in audit record and
in actual operation.
Sponsored by: The FreeBSD Foundation (auditdistd)
Reviewed by: rwatson
rmacklem [Sun, 16 Dec 2012 14:01:56 +0000 (14:01 +0000)]
MFC: r243782
Add an nfssvc() option to the kernel for the new NFS client
which dumps out the actual options being used by an NFS mount.
This will be used to add a "-m" option to nfsstat(1).
glebius [Sat, 15 Dec 2012 08:29:22 +0000 (08:29 +0000)]
Merge from head r244157:
Fix a crash in tcp_input(), that happens when mbuf has a fwd_tag on it,
but later after processing and freeing the tag, we need to jump back again
to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to
process and free the tag for second time.
kib [Thu, 13 Dec 2012 06:17:05 +0000 (06:17 +0000)]
MFC r242958:
Add the wait6(2) system call. It takes POSIX waitid()-like process
designator to select a process which is waited for. The system call
optionally returns siginfo_t which would be otherwise provided to
SIGCHLD handler, as well as extended structure accounting for child
and cumulative grandchild resource usage.
Allow to get the current rusage information for non-exited processes
as well, similar to Solaris.
The explicit WEXITED flag is required to wait for exited processes,
allowing for more fine-grained control of the events the waiter is
interested in.
Fix the handling of siginfo for WNOWAIT option for all wait*(2)
family, by not removing the queued signal state.
PR: standards/170346
MFC r243133:
Style fixes for r242958.
MFC r243134:
Alphabetically reorder the forward-declarations of the structures.
Add the declaration for enum idtype, to be used later.
MFC r243135:
Move the definition of the idtype_t from sys/types.h to sys/wait.h.
Fix the bug, use #if __BSD_VISIBLE instead of #if defined(__BSD_VISIBLE),
since __BSD_VISIBLE is always defined.
Reformat the comments from the Solaris style to KNF.
MFC r243136:
Restore the proper handling of the pid 0 for waitpid(2).
Fix the style around.
gjb [Wed, 12 Dec 2012 01:05:19 +0000 (01:05 +0000)]
MFC r244057, r244059:
r244057:
Get 'uname -r' earlier, so it can be used to determine what branch is
being run to set BSDINSTALL_DISTSITE accordingly. This change allows
non-RELEASE branches to use the FTP snapshots directory for bootonly.iso
installations.
jimharris [Wed, 12 Dec 2012 00:39:04 +0000 (00:39 +0000)]
MFC r243904:
Don't call bus_dmamap_load in CAM_DIR_NONE case, since there is nothing
to map, and technically this isn't allowed.
Functionally, it works OK (at least on x86) to call bus_dmamap_load with
a NULL data pointer and zero length, so this is primarily for correctness
and consistency with other drivers.
While here, remove check in isci_io_request_construct for nseg==0.
Previously, bus_dmamap_load would pass nseg==1, even for case where
buffer is NULL and length = 0, which allowed CAM_DIR_NONE CCBs
to get processed. This check is not correct though, and needed to be
removed both for the changes elsewhere in this patch, as well as jeff's
preliminary bus_dmamap_load_ccb patch (which uncovered all of this in
the first place).
MFC r243503:
Illumos 13879:4eac7a87eff2
3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb()
3330 space_seg_t should have its own kmem_cache
3331 deferred frees should happen after sync_pass 1
3335 make SYNC_PASS_* constants tunable
New loader-only tunables:
vfs.zfs.sync_pass_deferred_free
vfs.zfs.sync_pass_dont_compress
vfs.zfs.sync_pass_rewrite
MFC r243524:
Import the zio nop-write improvement from Illumos. To reduce I/O,
nop-write omits overwriting data if the checksum (cryptographically
secure) of new data matches the checksum of existing data.
It also saves space if snapshots are in use.
It currently works only on datasets with enabled compression, disabled
deduplication and sha256 checksums.
glebius [Mon, 10 Dec 2012 12:47:33 +0000 (12:47 +0000)]
Merge r242474:
Remove separate paragraph on ASCII messages and instead
provide this information along with messages documentation,
like this done in manual pages for other netgraph nodes.
grog [Mon, 10 Dec 2012 03:11:19 +0000 (03:11 +0000)]
MFC to r242840:
Add y flag and environment variable LS_SAMESORT to specify the same
sorting order for time and name with the -t option. IEEE Std 1003.2
(POSIX.2) mandates that the -t option sort in descending order, and
that if two files have the same timestamp, they should be sorted in
ascending order of their names. The -r flag reverses both of these
sort orders, so they're never the same. This creates significant
problems for sequentially named files stored on FAT file systems,
where it can be impossible to list them in the order in which they
were created.
Add , (comma) option to print file sizes grouped and separated by
thousands using the non-monetary separator returned by localeconv(3),
typically a comma or period.
eadler [Mon, 10 Dec 2012 02:33:17 +0000 (02:33 +0000)]
MFC r243084:
Add support for a -q flag. While here make the custom argument parsing
use getopt instead of hacking on it more. This change also fixes the
method of silencing the compiler warning about gfn being used
uninitialized.
mav [Sat, 8 Dec 2012 07:37:12 +0000 (07:37 +0000)]
MFC r243571:
Fix problem with the Samsung 840 PRO series SSD detection.
The device reports support for SATA Asynchronous Notification in its
IDENTIFY data, but returns error on attempt to enable that feature.
Make SATA XPT of CAM only report these errors, but not fail the device.
eadler [Sat, 8 Dec 2012 00:28:16 +0000 (00:28 +0000)]
MFC r243893:
Remove hack to emulate effective uid and just use the EUID's name in the
first place. I was unaware of this option when originally committing
this change.
I accidently merged this to only one directory last time :(
eadler [Sat, 8 Dec 2012 00:25:51 +0000 (00:25 +0000)]
MFC r243893:
Remove hack to emulate effective uid and just use the EUID's name in the
first place. I was unaware of this option when originally committing
this change.
Note that the I/O systems in FreeBSD and Solaris/Illumos are sufficiently
different that there is not a 1:1 mapping from scripts that work
with one to the other.
This commit includes the fix so that our probes use "kernel"
instead of the Solaris specific "genunix"
kib [Fri, 7 Dec 2012 01:13:07 +0000 (01:13 +0000)]
MFC r243142:
In pget(9), if PGET_NOTWEXIT flag is not specified, also search the
zombie list for the pid. This allows several kern.proc sysctls to
report useful information for zombies.
Hold the allproc_lock around all searches instead of relocking it.
Remove private pfind_locked() from the new nfs client code.
MFC r243528 (by pjd):
Look for zombie process only if we were given process id.
r243680:
cxgbe/tom: Add a flag to indicate that the L2 table entry for an
embryonic connection has been setup and never attempt to abort a tid
before this is done. This fixes a bad race where a listening socket is
closed when the driver is in the middle of step (b) here. The symptom
of this were "ARP miss" errors from the driver followed by tid leaks.
A hardware-offloaded passive open works this way:
a) A SYN "hits" the TCAM entry for a server tid and the chip delivers it
to the queue associated with the server tid (say, queue A). It waits
for a response from the driver telling it what to do.
b) The driver decides it is ok to proceed. It adds the new tid to the
list of embryonic connections associated with the server tid and then
hands off the SYN to the kernel's syncache to make sure that the kernel
okays it too. If it does then the driver provides an L2 table entry,
queue id (say, queue B), etc. and instructs the chip to send the SYN/ACK
response.
c) The chip delivers a status to queue B depending on how the third step
of the 3-way handshake goes. The driver removes the tid from its list
of embryonic connections and either expands the syncache entry or
destroys the tid. In any case all subsequent messages for the new tid
will be delivered to queue B, not queue A. Anything running in queue B
knows that the L2 entry has long been setup and the new flag is of no
interest from here on. If the listener is closed it will deal with
so_comp as normal.
r243681:
cxgbe/tom: Handle the case where the chip falls out of DDP mode by
itself. The hole in the receive sequence space corresponds to the
number of bytes placed directly up to that point.
rwatson [Thu, 6 Dec 2012 11:52:31 +0000 (11:52 +0000)]
Early MFC of portions of r243752 adding an auditdistd user to stable/8
in order to ease future upgrades; the remainder of r243752 is left for
a future MFC of the OpenBSM upgrade:
Merge a number of changes required to hook up OpenBSM 1.2-alpha2's
auditdistd (distributed audit daemon) to the build:
- Manual cross references
- Makefile for auditdistd
- rc.d script, rc.conf entrie
- New group and user for auditdistd; associated aliases, etc.
The audit trail distribution daemon provides reliable,
cryptographically protected (and sandboxed) delivery of audit tails
from live clients to audit server hosts in order to both allow
centralised analysis, and improve resilience in the event of client
compromises: clients are not permitted to change trail contents
after submission.
Submitted by: pjd
Sponsored by: The FreeBSD Foundation (auditdistd)
gjb [Thu, 6 Dec 2012 00:25:42 +0000 (00:25 +0000)]
MFC r242897:
Prevent including .zfs snapshot directories in the src.txz
distribution. This can happen if the src/ tree checkout is
within its own ZFS dataset, and the 'snapdir' ZFS property
is set to 'visible.'
pfg [Wed, 5 Dec 2012 22:47:45 +0000 (22:47 +0000)]
MFC r243641:
Partially bring r242520 to ext2fs.
When a file is first being written, the dynamic block reallocation
(implemented by ext2_reallocblks) relocates the file's blocks
so as to cluster them together into a contiguous set of blocks on
the disk.
When the cluster crosses the boundary into the first indirect block,
the first indirect block is initially allocated in a position
immediately following the last direct block. Block reallocation
would usually destroy locality by moving the indirect block out of
the way to keep the data blocks contiguous.
The issue was diagnosed long ago by Bruce Evans on ffs and surfaced
on ext2fs when block reallocaton was ported. This is only a partial
solution based on the similarities with FFS. We still require more
review of the allocation details that vary in ext2fs.
kib [Tue, 4 Dec 2012 00:57:11 +0000 (00:57 +0000)]
MFC r243342:
Schedule garbage collection run for the in-flight rights passed over
the unix domain sockets to the next tick, coalescing the serial calls
until the collection fires.
kib [Tue, 4 Dec 2012 00:54:49 +0000 (00:54 +0000)]
MFC r243341:
Add a special meaning to the negative ticks argument for
taskqueue_enqueue_timeout(). Do not rearm the callout if it is
already armed and the ticks is negative. Otherwise rearm it to fire
in abs(ticks) ticks in the future.
delphij [Mon, 3 Dec 2012 18:37:02 +0000 (18:37 +0000)]
MFC r242681 (ambrisko):
- Extend the prior commit to use the generic SCSI command building
function use that for JBOD and Thunderbolt disk write command. Now
we only have one implementation in mfi.
- Fix dumping on Thunderbolt cards. Polled IO commands do not seem to
be normally acknowledged by changing cmd_status to MFI_STAT_OK.
In order to get acknowledgement of the IO is complete, the Thunderbolt
command queue needs to be run through. I added a flag MFI_CMD_SCSI
to indicate this command is being polled and to complete the
Thunderbolt wrapper and indicate the result. This flag needs to be
set in the JBOD case in case if that us using Thunderbolt card.
When in the polling loop check for completed commands.
- Remove mfi_tbolt_is_ldio and just do the check when needed.
- Fix an issue when attaching of disk device happens when a device is
already scheduled to be attached but hasn't attached.
- add a tunable to allow raw disk attachment to CAM via:
hw.mfi.allow_cam_disk_passthrough=1
- fixup aborting of commands (AEN and LD state change). Use a generic
abort function and only wait the command being aborted not both.
Thunderbolt cards don't seem to abort commands so the abort times
out.
delphij [Mon, 3 Dec 2012 18:34:54 +0000 (18:34 +0000)]
MFC r242497:
Copy code from scsi_read_write() as mfi_build_syspd_cdb() to build SCSI
command properly. Without this change, mfi(4) always sends 10 byte READ
and WRITE commands, which will cause data corruption when device is
larger than 2^32 sectors.
dim [Sun, 2 Dec 2012 00:31:23 +0000 (00:31 +0000)]
Pull in r158245 from upstream clang:
[C++11 Compat] Fix breaking change in C++11 pair copyctor.
While this code is valid C++98, it is not valid C++11. The problem
can be reduced to:
class MDNode;
class DIType {
operator MDNode*() const {return 0;}
};
class WeakVH {
WeakVH(MDNode*) {}
};
int main() {
DIType di;
std::pair<void*, WeakVH> p(std::make_pair((void*)0, di)));
}
This was not detected by any of the bots we have because they either
compile C++98 with libstdc++ (which allows it), or C++11 with libc++
(which incorrectly allows it). I ran into the problem when compiling
with VS 2012 RC.
Thanks to Richard for explaining the issue.
This fixes building clang 3.1 on stable/9 with libc++ in C++11 mode.
This is a direct commit to stable/9, since there is no separate commit
in head which has just this particular change, and I do not want to do a
full import at this time.
rmacklem [Sat, 1 Dec 2012 01:11:59 +0000 (01:11 +0000)]
MFC: r241568
Add a new '-S' option to mountd, which tells it to suspend
execution of the nfsd threads while it is reloading the exports.
This avoids clients from getting intermittent access errors
when the exports are being reloaded non-atomically.
It is not an ideal solution, since requests will back up while
the nfsd threads are suspended. Also, when this option is used,
if mountd crashes while reloading exports, mountd will have to
be restarted to get the nfsd threads to resume execution.
This has been tested by Vincent Hoffman (vince at unsane.co.uk)
and John Hickey (jh at deterlab.net).
The nfse patch offers a more comprehensive solution for this issue.
rmacklem [Sat, 1 Dec 2012 01:07:51 +0000 (01:07 +0000)]
MFC: r241561
Add two new options to the nfssvc(2) syscall that allow
processes running as root to suspend/resume execution
of the kernel nfsd threads. An earlier version of this
patch was tested by Vincent Hoffman (vince at unsane.co.uk)
and John Hickey (jh at deterlab.net).
mav [Thu, 29 Nov 2012 18:08:36 +0000 (18:08 +0000)]
MFC r242323, r242328:
Add basic BIO_DELETE support to GEOM RAID class for all RAID levels.
If at least one subdisk in the volume supports it, BIO_DELETE requests
will be propagated down. Unfortunatelly, for RAID levels with redundancy
unmapped blocks will be mapped back during first rebuild/resync process.