mjg [Fri, 3 Jan 2014 16:34:16 +0000 (16:34 +0000)]
Don't check for fd limits in fdgrowtable_exp.
Callers do that already and additional check races with process
decreasing limits and can result in not growing the table at all, which
is currently not handled.
mav [Fri, 3 Jan 2014 15:09:59 +0000 (15:09 +0000)]
Rework NFS Duplicate Request Cache cleanup logic.
- Introduce additional hash to group requests by hash of sockref. This
allows to process TCP acknowledgements without looping though all the cache,
and as result allows to do it every time.
- Indroduce additional callbacks to notify application layer about sockets
disconnection. Without this last few requests processed just before socket
disconnection never processed their ACKs and stuck in cache for many hours.
- Implement transport-specific method for tracking reply acknowledgements.
New implementation does not cross multiple stack layers to get the data and
does not have race conditions that previously made some requests stuck
in cache. This could be done more efficiently at sockbuf layer, but that
would broke some KBIs, while I don't know other consumers for it aside NFS.
- Instead of traversing all DRC twice per request, run cleaning only once
per request, and except in some conditions traverse only single hash slot
at a time.
Together this limits NFS DRC growth only to situations of real connectivity
problems. If network is working well, and so all replies are acknowledged,
cache remains almost empty even after hours of heavy load. Without this
change on the same test cache was growing to many thousand requests even
with perfectly working local network.
As another result this reduces CPU time spent on the DRC handling during
SPEC NFS benchmark from about 10% to 0.5%.
jhb [Thu, 2 Jan 2014 21:26:59 +0000 (21:26 +0000)]
Rework the DSDT generation code a bit to generate more accurate info about
LPC devices. Among other things, the LPC serial ports now appear as
ACPI devices.
- Move the info for the top-level PCI bus into the PCI emulation code and
add ResourceProducer entries for the memory ranges decoded by the bus
for memory BARs.
- Add a framework to allow each PCI emulation driver to optionally write
an entry into the DSDT under the \_SB_.PCI0 namespace. The LPC driver
uses this to write a node for the LPC bus (\_SB_.PCI0.ISA).
- Add a linker set to allow any LPC devices to write entries into the
DSDT below the LPC node.
- Move the existing DSDT block for the RTC to the RTC driver.
- Add DSDT nodes for the AT PIC, the 8254 ISA timer, and the LPC UART
devices.
- Add a "SuperIO" device under the LPC node to claim "system resources"
aling with a linker set to allow various drivers to add IO or memory
ranges that should be claimed as a system resource.
- Add system resource entries for the extended RTC IO range, the registers
used for ACPI power management, the ELCR, PCI interrupt routing register,
and post data register.
- Add various helper routines for generating DSDT entries.
kib [Thu, 2 Jan 2014 18:50:52 +0000 (18:50 +0000)]
Update the description for pmap_remove_pages() to match the modern
times [1]. Assert that the pmap passed to pmap_remove_pages() is only
active on current CPU.
Submitted by: alc [1]
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.
delphij [Thu, 2 Jan 2014 08:10:35 +0000 (08:10 +0000)]
MFV r260155:
When we encounter an I/O error on a piece of metadata while deleting
a file system or zvol, we don't update the bptree_entry_phys_t's
bookmark. This would lead to double free of bp's which will lead to
space map corruption.
Instead of tolerating and allowing the corruption, panic immediately.
See Illumos #4390 for more details.
4391 panic system rather than corrupting pool if we hit bug 4390
marcel [Wed, 1 Jan 2014 22:51:19 +0000 (22:51 +0000)]
Implement atomic_swap_<type>.
The operation was documented and implemented partially (both from a
type and architecture perspective) on 2013-08-21 and got used in
ZFS with revision 260150 (zfeature.c) and since ZFS is supported on
ia64, the lack of having atomic_swap became problem.
glebius [Wed, 1 Jan 2014 21:48:04 +0000 (21:48 +0000)]
- Use counter(9) for node stats updated at a high rate.
- Use simple ++ for rare events.
- Use uma_zone_get_cur() to get knowledge about space left in cache.
- Convert many fields of struct ng_netflow_info to 64 bit.
Tested by: Viktor Velichkin <avisom yandex.ru>
Sponsored by: Nginx, Inc.
neel [Wed, 1 Jan 2014 21:17:08 +0000 (21:17 +0000)]
Restructure the VMX code to enter and exit the guest. In large part this change
hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used
to enter guest context and vmx_exit_guest() is used to transition back into
host context.
Fix a longstanding race where a vcpu interrupt notification might be ignored
if it happens after vmx_inject_interrupts() but before host interrupts are
disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with
host interrupts disabled to prevent this.
zbb [Wed, 1 Jan 2014 20:26:08 +0000 (20:26 +0000)]
Use only mapped BIOs on ARM
Using unmapped BIOs causes failure inside bus_dmamap_sync, since
this function requires valid MVA address, which is not present
if mapping is not set up.
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
zbb [Wed, 1 Jan 2014 20:03:48 +0000 (20:03 +0000)]
Add polarity and level support to ARM GIC
Add suport for setting triggering level and polarity in GIC.
New function pointer was added to nexus which corresponds
to the function which sets level/sense in the hardware (GIC).
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
* msun/src/s_tanhl.c:
. Remove unused variables.
. Fix location/indentation of comments.
. Use comparison involving ints instead of long double.
. Re-order polynomial evaluation on ld128 for |x| < 0.25.
For now, retain the older order in an "#if 0 ... #else" block.
. Use int comparison to short-circuit the |x| < 1.5 condition.
FreeBSD porting notes: the kernel part of this changeset depends
on Solaris buf(9S) interfaces and are not really applicable for
our use. vdev_disk.c is patched as-is to reduce diverge from
upstream, but vdev_file.c is left intact.
alc [Tue, 31 Dec 2013 18:25:15 +0000 (18:25 +0000)]
Since the introduction of the popmap to reservations in r259999, there is
no longer any need for the page's PG_CACHED and PG_FREE flags to be set and
cleared while the free page queues lock is held. Thus, vm_page_alloc(),
vm_page_alloc_contig(), and vm_page_alloc_freelist() can wait until after
the free page queues lock is released to clear the page's flags. Moreover,
the PG_FREE flag can be retired. Now that the reservation system no longer
uses it, its only uses are in a few assertions. Eliminating these
assertions is no real loss. Other assertions catch the same types of
misbehavior, like doubly freeing a page (see r260032) or dirtying a free
page (free pages are invalid and only valid pages can be dirtied).
Eliminate an unneeded variable from vm_page_alloc_contig().
markj [Tue, 31 Dec 2013 15:45:12 +0000 (15:45 +0000)]
Some DTrace tests (mostly in the pid provider directory) make use of
executable ksh scripts. These are currently not copied into the test
directory the way that compiled executables are, so the tests which make use
of them cannot work. This changes the test Makefile to copy the scripts into
the test directory.
markj [Tue, 31 Dec 2013 15:41:16 +0000 (15:41 +0000)]
Allocate the probe ID unrhdr before the DTrace kld_* event handlers are
registered. Otherwise there is a small window during which probe IDs may be
allocated before the unrhdr is allocated.
markj [Tue, 31 Dec 2013 15:37:51 +0000 (15:37 +0000)]
Revert r260091. The vmem calls seem to be slower than the *_unr() calls that
they replaced, which is important considering that probe IDs are allocated
during process startup for USDT probes.
imp [Tue, 31 Dec 2013 07:36:39 +0000 (07:36 +0000)]
Add support for Samsung K9F2G08U0A (256MiB SLC) NAND found on some old
Atmel boards I have.
# All Samsung, Toshiba and SanDisk parts will need to be in this table
# since they don't conform to the ONFI specification (they are all Toggle
# parts). There's some standards for the additional bytes so there's some hope
# to decode them automatically on a per-vendor basis, but even that has
# problems (and is what motivated the ONFI parameter page).
imp [Tue, 31 Dec 2013 04:40:25 +0000 (04:40 +0000)]
Delete echoed doesn't rub out the previous character, so always
use <backspace> <space> <backspace> instead. This fixes hitting
DELETE instead of BACKSPACE at mountroot> prompt.
edavis [Mon, 30 Dec 2013 23:02:26 +0000 (23:02 +0000)]
For TSO, when the first mbuf contains both the packet header and data, the
header is split out into its own BD for processing by the firmware. When
this split occurred the data length in the BD was not being set correctly
resulting in packet corruption.
dim [Mon, 30 Dec 2013 20:34:53 +0000 (20:34 +0000)]
Similar to r260020, only use -fms-extensions with gcc, for all other
modules which require this flag to compile. Use a GCC_MS_EXTENSIONS
variable, defined in kern.pre.mk, which can be used to easily supply the
flag (or not), depending on the compiler type.
dim [Mon, 30 Dec 2013 19:05:50 +0000 (19:05 +0000)]
For sys/boot/i386 and sys/boot/pc98, separate flags to be passed
directly to the linker (LD_FLAGS) from flags passed indirectly, via the
compiler driver (LDFLAGS).
This is because several Makefiles under sys/boot/i386 and sys/boot/pc98
use ${LD} directly to link, and the normal LDFLAGS value should not be
used in these cases.
markj [Mon, 30 Dec 2013 17:37:32 +0000 (17:37 +0000)]
Now that vmem(9) is available, use vmem arenas to allocate probe and
aggregation IDs, as is done in the upstream illumos code. This still
requires some FreeBSD-specific code, as our vmem API is not identical to the
one in illumos.
trasz [Mon, 30 Dec 2013 12:18:06 +0000 (12:18 +0000)]
Fix extremely slow operation with data digests enabled. This was caused
by receive code waiting for data digest even when the data segment was
empty. It didn't actually read it, but it waited until those four bytes
become available in the socket buffer, i.e. until any other PDU (such as NOP)
came in.
PR: kern/185240
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
scottl [Mon, 30 Dec 2013 01:32:17 +0000 (01:32 +0000)]
Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases. Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp. By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.
Reviewed by: max
Obtained from: Netflix
MFC after: 3 days
scottl [Mon, 30 Dec 2013 01:16:08 +0000 (01:16 +0000)]
Add the -R option to allow fsck_ffs to restart itself when too many critical
errors have been detected in a particular run.
Clean up the global state variables so that a restart can happen correctly.
Separate the global variables in fsck_ffs and fsdb to their own file. This
fixes header sharing with fscd.
Correctly initialize, static-ize, and remove global variables as needed in
dir.c. This fixes a problem with lost+found directories that was causing
a segfault.
Correctly initialize, static-ize, and remove global variables as needed in
suj.c.
Initialize the suj globals before allocating the disk object, not after.
Also ensure that 'preen' mode doesn't conflict with 'restart' mode
Submitted by: scottl, max
Reviewed by: max, mckusick (earlier version)
Obtained from: Netflix
MFC after: 3 days
kargl [Mon, 30 Dec 2013 01:06:21 +0000 (01:06 +0000)]
* Makefile:
. Hook coshl, sinhl, and tanhl into libm.
. Create symbolic links for corresponding manpages.
. While here remove a nearby extraneous space.
* Symbol.map:
* src/math.h:
. Move coshl, sinhl, and tanhl to their proper locations.
* man/cosh.3:
* man/sinh.3:
* man/tanh.3:
. Update the manpages.
kargl [Mon, 30 Dec 2013 00:51:25 +0000 (00:51 +0000)]
* ld80/k_expl.h:
* ld128/k_expl.h:
. Split out a computational kernel,__k_expl(x, &hi, &lo, &k) from expl(x).
x must be finite and not tiny or huge. The kernel returns hi and lo
values for extra precision and an exponent k for a 2**k scale factor.
. Define additional kernels k_hexpl() and hexpl() that include a 1/2
scaling and are used by the hyperbolic functions.
* ld80/s_expl.c:
* ld128/s_expl.c:
. Use the __k_expl() kernel.
marius [Sun, 29 Dec 2013 23:46:59 +0000 (23:46 +0000)]
- Probe with BUS_PROBE_DEFAULT instead of 0.
- Nuke code setting PCI_POWERSTATE_D0; pci(4) already does that for type 0
devices.
- There's no need to keep track of resource IDs.
- Quiesce the interrupt before actually detaching.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
marius [Sun, 29 Dec 2013 23:05:01 +0000 (23:05 +0000)]
- Probe with BUS_PROBE_DEFAULT instead of 0.
- Nuke code setting PCI_POWERSTATE_D0; pci(4) already does that for type 0
devices.
- Use PCIR_BAR instead of a homegrown macro.
- There's no need to keep track of resource IDs.
- Quiesce the interrupt before actually detaching.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
marius [Sun, 29 Dec 2013 22:56:05 +0000 (22:56 +0000)]
- Probe with BUS_PROBE_DEFAULT instead of 0.
- Nuke code setting PCI_POWERSTATE_D0; pci(4) already does that for type 0
devices.
- Use PCIR_BAR instead of a homegrown macro.
- There's no need to keep track of resource IDs.
- Quiesce the interrupt before actually detaching.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
- Nuke dupe $FreeBSD$.
marius [Sun, 29 Dec 2013 22:43:14 +0000 (22:43 +0000)]
- Add support for using MSI instead of INTx, controllable via the tunable
hw.ral.msi_disable (defaulting to using MSI).
- Probe with BUS_PROBE_DEFAULT instead of 0.
- Nuke code setting PCI_POWERSTATE_D0; pci(4) already does that for type 0
devices.
- Use PCIR_BAR instead of a homegrown macro.
- There's no need to keep track of resource IDs.
- Release resources again in case attaching fails.
- Quiesce the interrupt before detaching.
- Sprinkle const.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
- Trim headers.
- Nuke dupe $FreeBSD$.
glebius [Sun, 29 Dec 2013 22:20:06 +0000 (22:20 +0000)]
Fix couple of bugs from r257692 related to scan of address list on
an interface:
- in in_control() skip over not AF_INET addresses.
- in in_aifaddr_ioctl() and in_difaddr_ioctl() do correct check
of address family, w/o accessing memory beyond struct ifaddr.
marius [Sun, 29 Dec 2013 20:41:32 +0000 (20:41 +0000)]
- Remove a redundant variable in mpt_pci_attach().
- #if 0 the currently unused paired port linking and unlinking of dual
adapters.
- Simplify MSI/MSI-X allocation and release. For a single one, we don't need
to fiddle with the MSI/MSI-X count and pci_release_msi(9) is smart enough
to just do nothing in case of INTx.
- Canonicalize actions taken on attach failure and detach.
- Remove the remainder of incomplete support for older FreeBSD versions.
marius [Sun, 29 Dec 2013 19:32:27 +0000 (19:32 +0000)]
- There's no need to keep track of resource IDs.
- Simplify MSI allocation and release. For a single one, we don't need to
fiddle with the MSI count and pci_release_msi(9) is smart enough to just
do nothing in case of INTx.
- Don't allocate MSI as RF_SHAREABLE.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
markj [Sun, 29 Dec 2013 19:27:32 +0000 (19:27 +0000)]
When clearing relocations to __dtrace* symbols, handle both SHT_REL and
SHT_RELA sections properly instead of assuming that the relocation section
is of type SHT_REL.
marius [Sun, 29 Dec 2013 19:21:59 +0000 (19:21 +0000)]
- Switch to using the common MII bitbang'ing code instead of duplicating it.
- Based on lessons learnt with dc(4) (see r185750), add bus space barriers to
the MII bitbang read and write functions as well as to instances of page
switching.
- Add missing locking to ed_ifmedia_{upd,sts}().
- Canonicalize some messages.
- Based on actual functionality, ED_TC5299J_MII_DIROUT should be rather named
ED_TC5299J_MII_DIRIN.
- Remove unused headers.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.