sephe [Wed, 27 Jan 2016 03:53:30 +0000 (03:53 +0000)]
hyperv/vmbus: Event handling code refactor.
- Use taskqueue instead of swi for event handling.
- Scan the interrupt flags in filter
- Disable ringbuffer interrupt mask in filter to ensure no unnecessary
interrupts.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: adrian, sephe, Dexuan <decui microsoft com>
Approved by: adrian (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4920
jhibbits [Wed, 27 Jan 2016 02:23:54 +0000 (02:23 +0000)]
Convert rman to use rman_res_t instead of u_long
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources. For
now, this is still compatible with u_long.
This is step one in migrating rman to use uintmax_t for resources instead of
u_long.
Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.
This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.
luigi [Wed, 27 Jan 2016 02:08:30 +0000 (02:08 +0000)]
bugfix: the scheduler template (dn_schk) for the round robin scheduler
is followed by another structure (rr_schk) whose size must be set
in the schk_datalen field of the descriptor.
Not allocating the memory may cause other memory to be overwritten
(though dn_schk is 192 bytes and rr_schk only 12 so we may be lucky
and end up in the padding after the dn_schk).
bdrewery [Wed, 27 Jan 2016 01:33:26 +0000 (01:33 +0000)]
Revert yacc dependency back to pre-r241298.
Several attempts to fix this logic was done after r241298, which were
all reverted, yet this change was not.
The .h file does not depend on the .c file, so do not impose such a
dependency on it. They are generated by the same command but do not
depend on each other. Restore the .ORDER which should handle parallel build
issues. This fixes an actual bug where the .h file is not recreated
when missing [1]. For example:
cd lib/libc
make cleanobj
make nsparser.h
rm nsparser.h
make nsparser.h # will not rebuild nsparser.h
I have been trying to track down a build problem where nsparser.h is
missing when nslexer.o is built. It is possible this is related.
bdrewery [Wed, 27 Jan 2016 01:24:05 +0000 (01:24 +0000)]
Fix DIRDEPS_BUILD after r294752.
DIRDEPS_BUILD does not yet support PROGS having their own dependency
file.
Overriding .MAKE.DEPENDFILE here causes major problems with the meta
mode logic since it creates the Makefile.depend as '.depend' resulting
in infinite loops in make due to dirdeps.mk including .depend endlessly.
glebius [Wed, 27 Jan 2016 00:45:46 +0000 (00:45 +0000)]
Augment struct tcpstat with tcps_states[], which is used for book-keeping
the amount of TCP connections by state. Provides a cheap way to get
connection count without traversing the whole pcb list.
dteske [Tue, 26 Jan 2016 23:59:30 +0000 (23:59 +0000)]
Add `-k' for dpv(3) `keep_tite' config option
For scripts using dialog(1) several times, it can be visually distracting
running dpv(1) several times amidst other dialogs. The `-k' option, similar
to dialog(1) `--keep-tite', enables the same functionality to smooth ti/te.
dteske [Tue, 26 Jan 2016 23:56:27 +0000 (23:56 +0000)]
Add keep_tite configuration option
Similar to dialog(3) keep_tite option used to prevent visually disturbing
initialization or exit that could occur when run from a script using
dpv(3) by way of dpv(1) in sequence with other dialog(1) invocations.
jhb [Tue, 26 Jan 2016 19:07:09 +0000 (19:07 +0000)]
Add support to libsysdecode for decoding system call names.
A new sysdecode_syscallname() function accepts a system call code and
returns a string of the corresponding name (or NULL if the code is
unknown). To support different process ABIs, the new function accepts a
value from a new sysdecode_abi enum as its first argument to select the
ABI in use. Current ABIs supported include FREEBSD (native binaries),
FREEBSD32, LINUX, LINUX32, and CLOUDABI64. Note that not all ABIs are
supported by all platforms. In general, a given ABI is only supported
if a platform can execute binaries for that ABI.
To simplify the implementation, libsysdecode's build reuses the
existing pre-generated files from the kernel source tree rather than
duplicating new copies of said files during the build.
kdump(1) and truss(1) now use these functions to map system call
identifiers to names. For kdump(1), a new 'syscallname()' function
consolidates duplicated code from ktrsyscall() and ktrsyscallret().
The Linux ABI no longer requires custom handling for ktrsyscall() and
linux_ktrsyscall() has been removed as a result.
imp [Tue, 26 Jan 2016 18:39:31 +0000 (18:39 +0000)]
Default NANO_DRIVE to ada0 not ad0. This shouldn't affect working
configs (since they'd have to change NANO_DRIVE to be ada0 to work),
but will fix old ones that used to work.
avos [Tue, 26 Jan 2016 16:50:59 +0000 (16:50 +0000)]
rtwn: do not start vap when initialization fails
- Start vap(s) (via ieee80211_start_all()) only when initialization
succeeds; stop the first vap otherwise (via ieee80211_stop());
- Do not try to stop a device multiple times
(move (sc->sc_flags & RTWN_RUNNING) check to urtwn_stop_locked()).
hiren [Tue, 26 Jan 2016 16:33:38 +0000 (16:33 +0000)]
Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds and
60 seconds, respectively. Turn them into sysctls that can be tuned live. The
default values of 5 seconds and 60 seconds have been retained.
hselasky [Tue, 26 Jan 2016 15:12:31 +0000 (15:12 +0000)]
LinuxKPI list updates:
- Add some new hlist macros.
- Update existing hlist macros removing the need for a temporary
iteration variable.
- Properly define the RCU hlist macros to be SMP safe with regard
to RCU.
- Safe list macro arguments by adding a pair of parentheses.
- Prefix the _list_add() and _list_splice() functions with "linux"
to reflect they are LinuxKPI internal functions.
kib [Tue, 26 Jan 2016 14:46:39 +0000 (14:46 +0000)]
Don't clear the software flow control flag before draining for last
close or assert the bug that it is clear when leaving.
Remove an unrelated rotted comment that was attached to the buggy
clearing.
Since draining is not done in more cases, flushing is needed in more
cases, so start fixing flushing:
- do a full flush in ttydisc_close(). State what POSIX requires more
clearly. This was missing ttydevsw_pktnotify() calls to tell the
devsw layer to flush. Hardware tty drivers don't actually flush
since they don't understand this API.
- fix 2 missing wakeups in tty_flush(). Most of the wakeups here are
unnecessary for last close. But ttydisc_close() did one of the
missing ones.
This flow control bug ameliorated the design bug of requiring
potentially unbounded waits in draining. Software flow control is the
easiest way to get an unbounded wait, and a long wait is sometimes
actually useful. Users can type the xoff character on the receiver
and (if ixon is set on the sender) expect the output to be held until
the user is ready for more.
Hardware flow control can also give the unbounded wait, and this bug
didn't affect hardware flow control. Unbounded waits from hardware
flow control take a more unusual configuration. E.g., a terminal
program that controls the modem status lines, or unplugging the cable
in a configuration where this doesn't break the connection.
The design bug is still ameliorated by a newer bug in draining for
last close -- the 1 second timeout. E.g., if the user types the
xoff character and the sender reaches last close, then output is
not resumed and the wait times out after just 1 second. This is
broken, but preferable to an unbounded wait. Before this change,
the output was resumed immediately and usually completed.
br [Tue, 26 Jan 2016 14:34:40 +0000 (14:34 +0000)]
Remove uathload from build due to issue with GCC 5.2.0:
"ld: --relax and -r may not be used together."
Requires fixing ld command line arguments and testing.
skra [Tue, 26 Jan 2016 13:50:44 +0000 (13:50 +0000)]
Make pmap_fault() return values vm subsystem compliant to
simplify their handling in abort_handler(). While here,
remove one extra initialization of pcb variable.
mav [Tue, 26 Jan 2016 13:45:41 +0000 (13:45 +0000)]
MFV r294819: 6495 Fix mutex leak in dmu_objset_find_dp
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Albert Lee <trisk@omniti.com>
Author: Steven Hartland <steven.hartland@multiplay.co.uk>
mav [Tue, 26 Jan 2016 13:37:30 +0000 (13:37 +0000)]
MFV r294816: 4986 receiving replication stream fails if any snapshot
exceeds refquota
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Author: Dan McDonald <danmcd@omniti.com>
This allows to do a full (non-incremental send) and receive it as a clone
of an existing dataset. It can leverage nopwrite to share blocks with the
origin. This can be used to change the relationship of datasets on the
target. For example, maybe on the source you have:
A ---- B ---- C
And you have sent to the target a full of B, and the incremental B->C:
B ---- C
You later realize that you want to have A on the target. You will have to
do a full send of A, but nopwrite can save you space on the target if you
receive it as a clone of B, assuming that A and B have some blocks inxi
common:
mav [Tue, 26 Jan 2016 13:03:01 +0000 (13:03 +0000)]
MFV r294812: 6434 sa_find_sizes() may compute wrong SA header size
Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Andriy Gapon <avg@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: James Pan <jiaming.pan@yahoo.com>
mav [Tue, 26 Jan 2016 12:58:58 +0000 (12:58 +0000)]
MFV r294810: 6414 vdev_config_sync could be simpler
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Will Andrews <will@firepipe.net>
mav [Tue, 26 Jan 2016 12:52:16 +0000 (12:52 +0000)]
MFV r294806: 6388 Failure of userland copy should return EFAULT
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Richard Yao <ryao@gentoo.org>
mav [Tue, 26 Jan 2016 12:50:14 +0000 (12:50 +0000)]
MFV r294804: 6386 Fix function call with uninitialized value in vdev_inuse
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Richard Yao <ryao@gentoo.org>
mav [Tue, 26 Jan 2016 12:48:10 +0000 (12:48 +0000)]
MFV r294802: 6334 Cannot unlink files when over quota
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Simon Klinkert <simon.klinkert@gmail.com>
mav [Tue, 26 Jan 2016 12:44:49 +0000 (12:44 +0000)]
MFV r294800: 6385 Fix unlocking order in zfs_zget
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Andriy Gapon <avg@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Richard Yao <ryao@gentoo.org>
skra [Tue, 26 Jan 2016 10:24:18 +0000 (10:24 +0000)]
Don't do icache sync on kernel memory and keep in line with comment
in elf_cpu_load_file(). The only time when the sync is needed is after
kernel module is loaded and the relocation info is processed. And it's
done in elf_cpu_load_file().
sephe [Tue, 26 Jan 2016 09:42:13 +0000 (09:42 +0000)]
hyperv/hn: Improve sending performance
- Avoid main lock contention by trylock for if_start, if that fails,
schedule TX taskqueue for if_start
- Don't do direct sending if the packet to be sent is large, e.g.
TSO packet.
This change gives me stable 9.1Gbps TCP sending performance w/ TSO
over a 10Gbe directly connected network (the performance fluctuated
between 4Gbps and 9Gbps before this commit). It also improves non-
TSO TCP sending performance a lot.
Reviewed by: adrian, royger
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5074
kib [Tue, 26 Jan 2016 07:57:44 +0000 (07:57 +0000)]
Restore flushing of output for revoke(2) again. Document revoke()'s
intended behaviour in its man page. Simplify tty_drain() to match.
Don't call ttydevsw methods in tty_flush() if the device is gone
since we now sometimes call it then.
The flushing was supposed to be implemented by passing the FNONBLOCK
flag to VOP_CLOSE() for revoke(). The tty driver is one of the few
that can block in close and was one of the fewer that knew about this.
This almost worked in FreeBSD-1 and similarly in Net/2. These
versions only almost worked because there was and is considerable
confusion between IO_NDELAY and FNONBLOCK (aka O_NONBLOCK). IO_NDELAY
is only valid for VOP_READ() and VOP_WRITE(). For other VOPs it has
the same value as O_SHLOCK. But since vfs_subr.c and tty.c
consistently used the wrong flag and the O_SHLOCK flag is rarely set,
this mostly worked. It also gave the feature than applications could
get the non-blocking close by abusing O_SHLOCK.
This was first broken then fixed in 1995. I changed only the tty
driver to use FNONBLOCK, as a hack to get non-blocking via the normal
flag FNONBLOCK for last closes. I didn't know about revoke()'s use
of IO_NDELAY or change it to be consistent, so revoke() was broken.
Then I changed revoke() to match.
This was next broken in 1997 then fixed in 1998. Importing Lite2 made
the flags inconsistent again by undoing the fix only in vfs_subr.c.
This was next broken in 2008 by replacing everything in tty.c and not
checking any flags in last close. Other bugs in draining limited the
resulting unbounded waits to drain in some cases.
It is now possible to fix this better using the new FREVOKE flag.
Just restore flushing for revoke() for now. Don't restore or undo any
hacks for ordinary last closes yet. But remove dead code in the
1-second relative timeout (r272789). This did extra work to extend
the buggy draining for revoke() for as long as possible. The 1-second
timeout made this not very long by usually flushing after 1 second.
cy [Tue, 26 Jan 2016 07:06:44 +0000 (07:06 +0000)]
Add support for automatic leap-second file updates.
The working copy of leapfile resides in /var/dbntpd.leap-seconds.list.
/etc/ntp/leap-seconds (periodically updated from ftp://time.nist.gov/pub/
or ftp://tycho.usno.navy.mil/pub/ntp/) contains the master copy should
automatic leapfile updates be disabled (default).
Automatic leapfile updates are fetched from $ntp_leapfile_sources,
defaulting to https://www.ietf.org/timezones/data/leap-seconds.list,
within $ntp_leapfile_expiry_days (default 30 days) from leap-seconds
file expiry. Automatic updates can be enabled by setting
$daily_ntpd_leapfile_enable="YES" in periodic.conf. To avoid congesting
the ntp leapfile source the automatic update randomized by default but
can be disabled through daily_ntpd_avoid_congestion="NO" in
periodic.conf.
imp [Tue, 26 Jan 2016 06:26:56 +0000 (06:26 +0000)]
Allow new lines as white space for arguments that are parsed to allow
boot1 to pass in files with newlines in them. Now that the EFI loader
groks foo=bar on the command line, this can allow a more general setup
than traditional boot loader args will allow.
luigi [Tue, 26 Jan 2016 04:48:24 +0000 (04:48 +0000)]
Revert one chunk from commit 285362, which introduced an off-by-one error
in computing a shift index. The error was due to the use of mixed
fls() / __fls() functions in another implementation of qfq.
To avoid that the problem occurs again, properly document which
incarnation of the function we need.
Note that the bug only affects QFQ in FreeBSD head from last july, as
the patch was not merged to other versions.
jhibbits [Tue, 26 Jan 2016 04:41:18 +0000 (04:41 +0000)]
Older Book-E processors (e500v1/e500v2) don't support dcbzl.
The only difference between dcbzl and dcbz is dcbzl operates on native cache
line lengths regardless of L1CSR0[DCBZ32]. Since we don't change the cache line
size, the cacheline_size variable will reflect the used cache line length, and
dcbz will work as expected.
andrew [Mon, 25 Jan 2016 23:04:40 +0000 (23:04 +0000)]
Allow us to be told about memory past the first 4GB point, but ignore it.
This allows, for example, UEFI pass a memory map with some ram in this
region, but for us to ignore it. This is the case when running under the
qemu virt machine type.
bdrewery [Mon, 25 Jan 2016 22:29:44 +0000 (22:29 +0000)]
Fix PROGS not reading .depend files after r284288 by making DEPENDFILE work.
We have had this user-modifable DEPENDFILE variable forever that does nothing
relevant for the user since fmake always used '.depend'. Bmake
introduced the .MAKE.DEPENDFILE variable that can be modified to change
the name of '.depend'.
Prior to r284288, bsd.progs.mk was setting .MAKE.DEPENDFILE to allow
working incremental builds. This was modified most likely to not
conflict with the META MODE handling of .MAKE.DEPENDFILE as it has a lot
more special logic for that variable.
bdrewery [Mon, 25 Jan 2016 22:29:32 +0000 (22:29 +0000)]
Fix incremental build of dtrace probes.
Currently dtrace(1) -Go does not properly rebuild the target if it
exists. It results in missing symbols.
dtrace -C -x nolibs -G -o usdt.o -s /root/git/freebsd/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/json/usdt.d tst.usdt.o
dtrace: target object (usdt.o) already exists. Please remove the target
dtrace: object and rebuild all the source objects if you wish to run the DTrace
dtrace: linking process again
cc -O2 -pipe -O0 -g -I/root/git/freebsd/cddl/usr.sbin/dtrace/tests/common/json -std=gnu99 -fstack-protector-strong -Qunused-arguments -o tst.usdt.exe.full tst.usdt.o usdt.o
tst.usdt.o: In function `main':
/root/git/freebsd/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/json/tst.usdt.c:56: undefined reference to `__dtrace_bunyan_fake___log__debug'
/root/git/freebsd/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/json/tst.usdt.c:60: undefined reference to `__dtrace_bunyan_fake___log__debug'
cc: error: linker command failed with exit code 1 (use -v to see invocation)
*** [tst.usdt.exe.full] Error code 1
jamie [Mon, 25 Jan 2016 22:14:31 +0000 (22:14 +0000)]
Allow the (old rc-style) exec_afterstart jail parameters to start numbering
at 0, like exec_prestart and the others do. Make param0 optional, i.e.
still look for param1.