rmacklem [Tue, 4 Jun 2013 22:29:19 +0000 (22:29 +0000)]
MFC: r251079
Post-r248567, there were times when the client would return a
truncated directory for some NFS servers. This turned out to
be because the size of a directory reported by an NFS server
can be smaller that the ufs-like directory created from the
RPC XDR in the client. This patch fixes the problem by changing
r248567 so that vnode_pager_setsize() is only done for regular files.
smh [Tue, 4 Jun 2013 10:47:44 +0000 (10:47 +0000)]
Enhanced BIO_DELETE support for CAM SCSI to add ATA_TRIM support.
Disable CAM BIO queue sorting for non-rotating media by default.
MFC r249939 Added available delete methods discovery during device probe
MFC r249941 Automatically disable BIO queue sorting for non-rotating media
MFC r250033 Correct comment typo's
MFC r250179 Update probe flow so that devices with lbp can also disable disksort
MFC r250180 Fix probe in progress check in dareprobe
MFC r250181 Check for ATA Information VPD before querying for ATA
MFC r250183 Enable CAM SCSI to choice ATA TRIM during autodetection
MFC r250967 Enforce validation on the selected delete method via sysctl
jhb [Mon, 3 Jun 2013 17:07:34 +0000 (17:07 +0000)]
MFC 248167:
Fix the 'C' field for a running thread to match the behavior described
in the manpage by having it display the current CPU (ki_oncpu) rather
than the previously used CPU (ki_lastcpu). ki_lastcpu is still used for
all other thread states.
jkim [Mon, 3 Jun 2013 16:46:37 +0000 (16:46 +0000)]
MFC: r251186
Fix a long standing logic bug introduced in r167814. The code was added to
get RSDP from loader(8) hint via kenv(2) but the bug nullified the new code
and we always fell back to the previous method, i. e., sysctlbyname(3).
kib [Mon, 3 Jun 2013 04:48:09 +0000 (04:48 +0000)]
MFC r251033:
When handling an exception from the attempt from loading the faulting
context on return from the trap handler, re-enable the interrupts on
i386 and amd64.
marius [Sat, 1 Jun 2013 13:10:24 +0000 (13:10 +0000)]
MFC: r249203 (partial)
- Make ata_str2mode() static, it's not used outside of ata-all.c.
- Move ata_timeout() to ata-all.c so we don't need to expose both this
function and ata_cam_end_transaction() but only the former.
- Move ata_cmd2str() from ata-queue.c to ata-all.c.
- Add some missing prototypes.
dim [Fri, 31 May 2013 20:11:07 +0000 (20:11 +0000)]
MFC r251066:
Fix warnings from newer clang versions about constexpr member functions
not being implicitly const in libc++'s <chrono> header. The warnings
have been introduced because of new language rules recently adopted by
the C++ WG. More info:
jhb [Fri, 31 May 2013 14:36:09 +0000 (14:36 +0000)]
MFC 250219:
Fix two bugs in the current NUMA-aware allocation code:
- vm_phys_alloc_freelist_pages() can be called by vm_page_alloc_freelist()
to allocate a page from a specific freelist. In the NUMA case it did not
properly map the public VM_FREELIST_* constants to the correct backing
freelists, nor did it try all NUMA domains for allocations from
VM_FREELIST_DEFAULT.
- vm_phys_alloc_pages() did not pin the thread and each call to
vm_phys_alloc_freelist_pages() fetched the current domain to choose
which freelist to use. If a thread migrated domains during the loop
in vm_phys_alloc_pages() it could skip one of the freelists. If the
other freelists were out of memory then it is possible that
vm_phys_alloc_pages() would fail to allocate a page even though pages
were available resulting in a panic in vm_page_alloc().
jhb [Thu, 30 May 2013 19:14:34 +0000 (19:14 +0000)]
MFC 246417,247116,248584:
Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary. However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).
This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly. Instead of setting PBDRY on individual sleeps, change
the VFS_*() and VOP_*() methods to defer stop signals for filesystems which
request this behavior via a new VFCF_SBDRY flag. Note that this has to be
a VFC flag rather than a MNTK flag so that it works properly with
VFS_MOUNT() when the mount is not yet fully constructed. For now, only the
NFS clients set this new flag in VFS_SET().
A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
the request as being on behalf of the thread performing the lookup
(cnp_thread) rather than using a NULL thread pointer. This causes
NFS to properly handle signals during this VOP on an interruptible
mount.
- Ignore thread suspend requests due to SIGSTOP if stop signals are
currently deferred. This can occur if a process is stopped via
SIGSTOP while a thread is running or runnable but before it has set
TDF_SBDRY.
scottl [Thu, 30 May 2013 16:51:48 +0000 (16:51 +0000)]
MFC r248282, in a modified fashion. From the original changelog:
Add currently unused flag argument to the cluster_read(),
cluster_write() and cluster_wbuild() functions. The flags to be
allowed are a subset of the GB_* flags for getblk().
This merge adds a cluster_*_gb() API variant instead of changing the ABI
with an added argument to the existing API. Most API consumers that were
changed in the original rev have been left un-changed in this merge to
reduce churn. The mergeinfo is recorded though for future merging
convenience. This is effectively a no-op for the moment.
markj [Wed, 29 May 2013 22:30:29 +0000 (22:30 +0000)]
Revert my previous merge. There's a variable name difference between head
and stable (dirfd vs. dir_fd) and I managed to get it wrong again when I
did the MFC, even after I tested.
markj [Wed, 29 May 2013 21:07:24 +0000 (21:07 +0000)]
MFC r250545:
Some filesystems (NFS in particular) do not fill out the d_type field when
returning directory entries through readdir(3). In this case we need to
obtain the file type ourselves; otherwise newsyslog -t will not be able to
find archived log files and will fail to both delete old log files and to
do interval-based rotations properly.
mav [Wed, 29 May 2013 04:17:05 +0000 (04:17 +0000)]
MFC r250508:
Disable sending Early R_OK on SiI3726/SiI3826 port multipliers.
With "cached read" HDD testing and multiple ports busy on a SATA
host controller, 3726/3826 PMP will very rarely drop a deferred
R_OK that was intended for the host. Symptom will be all 5 drives
under test will timeout, get reset, and recover.
mav [Wed, 29 May 2013 04:12:53 +0000 (04:12 +0000)]
MFC r250819:
Fix vdc->Secondary_Element_Count metadata field access from 16 to 8 bit.
In some cases it could cause kernel panic during failed drive replacement.
marius [Tue, 28 May 2013 20:58:57 +0000 (20:58 +0000)]
MFC: r247570, r247591
- Make tables, device ID strings etc const. This includes #ifdef'ing 0
aac_command_status_table, which is actually unused since r111532.
While at it, make aac_if a pointer to the now const interface tables
instead of copying them over to the softc (this alone already reduces the
size of aac.ko on amd64 by ~1 KiB).
- Remove redundant softc members.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
- Remove redundant bzero(9)'ing of the softc.
- Use pci_enable_busmaster(9) instead of duplicating it.
- Remove redundant checking for PCIM_CMD_MEMEN (resource allocation will
just fail).
- Canonicalize the error messages in case of resource allocation failures.
- Add support for using MSI instead of INTx, controllable via the tunable
hw.aac.enable_msi (defaulting to on).
marius [Tue, 28 May 2013 20:53:26 +0000 (20:53 +0000)]
MFC: r245926, r245931
- Improve some comments.
- Make bge_lookup_{rev,vendor}() static.
- Factor out chip identification rather than duplicating the code.
- Sanitize bge_probe() a bit (don't hardcode buffer sizes, allow
bge_lookup_vendor() to return NULL so the excessive panic() can
be removed there, etc.) and return BUS_PROBE_DEFAULT rather than
hardcoding 0.
- According to the Linux tg3 driver, BCM57791 and BCM57795 aren't
capable of Gigabit Ethernet.
- Check the return value of taskqueue_start_threads().
- Mention NetLink controllers in the fallback description, too.
jhb [Tue, 28 May 2013 18:13:08 +0000 (18:13 +0000)]
MFC 247332:
Add a quirk to disable this driver for certain older laptops with an ICH2
southbridge and an Intel 82815_MC host bridge where the host bridge's
revision is less than 5
kib [Tue, 28 May 2013 05:51:00 +0000 (05:51 +0000)]
MFC r250853:
Fix the wait6(2) on 32bit architectures and for the compat32, by using
the right type for the argument in syscalls.master. Also fix the
posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the
architectures which require padding of the 64bit argument.
mckusick [Mon, 27 May 2013 22:18:04 +0000 (22:18 +0000)]
MFC of 250708:
Clean up trailing whitespace.
Submitted by: Andy Kosela
MFC of 250710:
When running the -m option to generate a newfs(8) command suitable for
recreating the filesystem, check for and output the -i, -k, and -l
options if appropriate.
Note the remaining deficiencies of the -m option in the dumpfs(8)
manual page. Specifically that newfs(8) options -E, -R, -S, and -T
options are not handled and that -p is not useful so is omitted.
Also document that newfs(8) options -n and -r are neither checked
for nor output but should be. The -r flag is needed if the filesystem
uses gjournal(8).
trociny [Sun, 26 May 2013 18:54:05 +0000 (18:54 +0000)]
MFC r250405:
Move snmp_hast manual to section 3, where all other manual pages for
bsnmp modules are located.
Section 3 (Library Functions) looks wrong for this manual page, which
contains only module description, that is why initially it was located
to section 8 (System Manager's Manual). On the other hand manual
pages for all other bsnmpd modules are already located in the section
3, and having all pages in one section looks more consistent. Also,
currently, similarly to manuals for other modules, snmp_hast manual
contains LIBRARY section, which is not good style for section 8.
jlh [Sun, 26 May 2013 14:40:23 +0000 (14:40 +0000)]
MFC r250992:
Rework the comment I initially wrote when SHLIB_LDSCRIPT was introduced.
The build system is really intricate and I had a hard time to remind the
whole picture even when reading my own words. This one will hopefully
be better.
The description explains why we should not configure "path",
"host.hostname", "command", "ip4.addr" and ip6.addr" parameters with
this, but rather use the historical rc.conf(5) options.
kib [Sat, 25 May 2013 11:05:00 +0000 (11:05 +0000)]
MFC r250505:
- Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The
null_hashget() obtains the reference on the nullfs vnode, which must
be dropped.
- Fix a wart which existed from the introduction of the nullfs
caching, do not unlock lower vnode in the nullfs_reclaim_lowervp().
It should be innocent, but now it is also formally safe. Inform the
nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on
nullfs inode.
- Add a callback to the upper filesystems for the lower vnode
unlinking. When inactivating a nullfs vnode, check if the lower
vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC
on the lower vnode, and reclaim upper vnode if so. This allows
nullfs to purge cached vnodes for the unlinked lower vnode, avoiding
excessive caching.
MFC r250852:
Do not leak the NULLV_NOUNLOCK flag from the nullfs_unlink_lowervp(),
for the case when the nullfs vnode is not reclaimed. Otherwise, later
reclamation would not unlock the lower vnode.
pfg [Thu, 23 May 2013 16:39:42 +0000 (16:39 +0000)]
MFC r250823:
grep: change some int types.
Change several int variables to size_t, ssize_t, or ptrdiff_t.
This should fix the bug described in CVE-2012-5667 when an input
line is so long that its length cannot be stored in an int
variable.
jamie [Wed, 22 May 2013 18:26:12 +0000 (18:26 +0000)]
MFC r250804:
Refine the "nojail" rc keyword, adding "nojailvnet" for files that don't
apply to most jails but do apply to vnet jails. This includes adding
a new sysctl "security.jail.vnet" to identify vnet jails.
scottl [Wed, 22 May 2013 08:44:21 +0000 (08:44 +0000)]
MFC r250327
Add a sysctl vfs.read_min to complement the exiting vfs.read_max. It
defaults to 1, meaning that it's off.
When read-ahead is enabled on a file, the vfs cluster code deliberately
breaks a read into 2 I/O transactions; one to satisfy the actual read,
and one to perform read-ahead. This makes sense in low-latency
circumstances, but often produces unbalanced i/o transactions that
penalize disks. By setting vfs.read_min, we can tell the algorithm to
fetch a larger transaction that what we asked for, achieving the same
effect as the read-ahead but without the doubled, unbalanced transaction
and the slightly lower latency. This significantly helps our workloads
with video streaming.
delphij [Wed, 22 May 2013 00:31:33 +0000 (00:31 +0000)]
MFC r250374:
According to the documentation, on Linux, cancel_delayed_work() does not
do drain (flush_workqueue() in Linux terms) but instead returns true if
the work was removed before it is run, or false otherwise.
Simulate this by removing the taskqueue_drain() and return the value
derived from taskqueue_cancel()'s return value.
This would solve a witness warning caused by calling taskqueue_drain()
with a non-sleepable lock held, like:
taskqueue_drain with the following non-sleepable locks held:
exclusive rw lle (lle) r = 0 (0xfffffe001450b410) locked @
/usr/src/sys/netinet/in.c:1484
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff848d4f7690
kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff848d4f7740
witness_warn() at witness_warn+0x4a8/frame 0xffffff848d4f7800
taskqueue_drain() at taskqueue_drain+0x3a/frame 0xffffff848d4f7840
set_timeout() at set_timeout+0x4a/frame 0xffffff848d4f7860
netevent_callback() at netevent_callback+0x16/frame 0xffffff848d4f7870
arpintr() at arpintr+0x9b5/frame 0xffffff848d4f7930
This do not affect kernel without OFED compiled in.
Reported by: Garrett Cooper <yaneurabeya gmail com>
Use procstat_getprocs(3) for retrieving thread information instead of
direct sysctl calls.
r249669:
Use more generic procstat_getvmmap(3) for retrieving VM layout of a process.
r249671:
Use procstat_getgroups(3) for retrieving groups information instead of
direct sysctl.
r249673:
Use procstat_getumask(3) for retrieving umaks information instead of
direct sysctl.
r249675:
Use procstat_getrlimit(3) for retrieving rlimit information instead of
direct sysctl calls.
r249678:
Use libprocstat(3) when retrieving binary information for a process.
r249680:
Use libprocstat(3) to retrieve process command line arguments and
environment variables.
r249683:
Use libprocstat(3) to retrieve ELF auxiliary vector.
r249685:
Use procstat_getkstack(3) for retrieving process kernel stacks
instead of direct sysctl calls.
r249686:
Make use of newly added libprocstat(3) ability to extract procstat
info from a process core file.
So now one can run procstat(1) on a process core e.g. to get a list of
files opened by a process when it crashed:
root@lisa:/ # procstat -f /root/vi.core
PID COMM FD T V FLAGS REF OFFSET PRO NAME
658 vi text v r r-------- - - - /usr/bin/vi
658 vi ctty v c rw------- - - - /dev/pts/0
658 vi cwd v d r-------- - - - /root
658 vi root v d r-------- - - - /
658 vi 0 v c rw------- 11 3208 - /dev/pts/0
658 vi 1 v c rw------- 11 3208 - /dev/pts/0
658 vi 2 v c rw------- 11 3208 - /dev/pts/0
658 vi 3 v r r----n-l- 1 0 - /tmp/vi.0AYKz3Lps7
658 vi 4 v r rw------- 1 0 - /var/tmp/vi.recover/vi.GaGYsz
658 vi 5 v r rw------- 1 0 - -