cache: put all negative entry management code into dedicated functions
==
cache: manage negative entry list with a dedicated lock
Since negative entries are managed with a LRU list, a hit requires a
modificaton.
Currently the code tries to upgrade the global lock if needed and is
forced to retry the lookup if it fails.
Provide a dedicated lock for use when the cache is only shared-locked.
==
cache: defer freeing entries until after the global lock is dropped
This also defers vdrop for held vnodes.
==
cache: improve scalability by introducing bucket locks
An array of bucket locks is added.
All modifications still require the global cache_lock to be held for
writing. However, most readers only need the relevant bucket lock and in
effect can run concurrently to the writer as long as they use a
different lock. See the added comment for more details.
This is an intermediate step towards removal of the global lock.
==
cache: get rid of the global lock
Add a table of vnode locks and use them along with bucketlocks to provide
concurrent modification support. The approach taken is to preserve the
current behaviour of the namecache and just lock all relevant parts before
any changes are made.
Lookups still require the relevant bucket to be locked.
==
cache: ignore purgevfs requests for filesystems with few vnodes
purgevfs is purely optional and induces lock contention in workloads
which frequently mount and unmount filesystems.
In particular, poudriere will do this for filesystems with 4 vnodes or
less. Full cache scan is clearly wasteful.
Since there is no explicit counter for namecache entries, the number of
vnodes used by the target fs is checked.
The default limit is the number of bucket locks.
== (by kib)
Limit scope of the optimization in r306608 to dounmount() caller only.
Other uses of cache_purgevfs() do rely on the cache purge for correct
operations, when paths are invalidated without unmount.
==
cache: split negative entry LRU into multiple lists
This splits the ncneg_mtx lock while preserving the hit ratio at least
during buildworld.
Create N dedicated lists for new negative entries.
Entries with at least one hit get promoted to the hot list, where they
get requeued every M hits.
Shrinking demotes one hot entry and performs a round-robin shrinking of
regular lists.
==
cache: fix up a corner case in r307650
If no negative entry is found on the last list, the ncp pointer will be
left uninitialized and a non-null value will make the function assume an
entry was found.
Fix the problem by initializing to NULL on entry.
== (by kib)
vn_fullpath1() checked VV_ROOT and then unreferenced
vp->v_mount->mnt_vnodecovered unlocked. This allowed unmount to race.
Lock vnode after we noticed the VV_ROOT flag. See comments for
explanation why unlocked check for the flag is considered safe.
==
cache: fix a race between entry removal and demotion
The negative list shrinker can demote an entry with only hotlist + neglist
locks held. On the other hand entry removal possibly sets the NCF_DVDROP
without aformentioned locks held prior to detaching it from the respective
netlist., which can lose the update made by the shrinker.
==
cache: plug a write-only variable in cache_negative_zap_one
==
cache: ensure that the number of bucket locks does not exceed hash size
The size can be changed by side effect of modifying kern.maxvnodes.
Since numbucketlocks was not modified, setting a sufficiently low value
would give more locks than actual buckets, which would then lead to
corruption.
arybchik [Sat, 31 Dec 2016 11:23:43 +0000 (11:23 +0000)]
MFC r310715
sfxge(4): fix GET_RXDP_CONFIG usage for multi-PF on Medford
On Medford, using MC_CMD_GET_RXDP_CONFIG to query the RX end
padding setting is in the ADMIN group, and so fails for
unprivileged functions. In that case, assume the largest size
supported by Medford hardware (256bytes) to prevent overrun.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
arybchik [Sat, 31 Dec 2016 11:23:03 +0000 (11:23 +0000)]
MFC r310714
sfxge(4): support Medford bootcfg partition layout in common code
For Siena and Huntington, the per-port bootcfg (aka expcfg) is
stored in a dedicated 4Kbyte partition for each port.
For Medford, the per-PF bootcfg is stored in a 2Kbyte sector
within a single shared partition. Update the common code to support
the new bootcfg layout.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
loos [Sat, 31 Dec 2016 01:54:48 +0000 (01:54 +0000)]
MFC r309345:
The RX_FREEBUFFER registers are a write to increment field.
Writing the full queue size to it every time was makeing it overflow with a
lot of bogus values.
This fixes the interrupt storms on irq 40.
MFC r309347:
MDIO_PHYACCESS_ACK is only valid for read access, remove it from
miibus_writereg.
Reduce the DELAY() between reads while waiting for MII access.
ngie [Sat, 31 Dec 2016 00:00:12 +0000 (00:00 +0000)]
MFC r310455:
Clarify failure in snmp_output(..) with call to snmp_pdu_decode
- Explicitly test snmp_pdu_encode against SNMP_CODE_OK instead of assuming
any non-zero value is bad.
- Print out the code before calling abort() to give the end-user something
actionable to debug without having to recompile the binary, since the
core might not have these details.
ngie [Fri, 30 Dec 2016 23:52:19 +0000 (23:52 +0000)]
MFC r310458,r310466:
r310458:
Group all loadable modules in the %default section
This will allow new users to uncomment the modules and have things work
with less head scratching, in the event they decide to uncomment any
of the section separators, e.g. %usm or %vcm, as the module loading is
only effective in the %default section.
r310466:
Don't hardcode $(securityModelUSM) (3) in the authPriv example under the %vacm
section
loos [Fri, 30 Dec 2016 20:43:00 +0000 (20:43 +0000)]
MFC r306717:
if_cpsw overhaul:
- Fix RX and TX teardown:
. TX teardown would not reclaim the abandoned descriptors;
. Interrupt storms in RX teardown;
. Fixed the acknowledge of the teardown completion interrupt.
- Remove temporary lists for the descriptors;
- Simplified the descriptor handling (less writes and reads from
descriptors where possible);
- Better debug;
- Add support for the RX threshold interrupts:
With interrupt moderation only, an RX overrun is likely to happen. The
RX threshold is set to trigger a non paced interrupt everytime your RX
free buffers are under the minimum threshold, helping to prevent the rx
overrun.
The NIC now survive when pushed over its limits (where previously it would
lock up in a few seconds).
uFW (600MHz SoC) can now forward up to 560Mb/s of UDP traffic (netmap
pkt-gen as source and sink). TCP forwarding rate is over 350Mb/s.
No difference (other than CPU use) was seen on Beaglebone black (1GHz SoC)
for his fast ethernet.
Tested on: uFW, BBB
Sponsored by: Rubicon Communications, LLC (Netgate)
loos [Fri, 30 Dec 2016 20:33:54 +0000 (20:33 +0000)]
MFC r306376:
Add a sysctl to control the interrupt pacing on AM335x integrated switch.
The hardware can be set to limit the number of interrupts from 2 to 63
interrupts per ms.
To keep the compatibility with the TI documentation the sysctl take the
interval between the interrupts pulses: 16~500 us.
arybchik [Fri, 30 Dec 2016 17:26:19 +0000 (17:26 +0000)]
MFC r310627
sfxge(4): do not limit driver RSS table to RSS channels max
Specification of entire RSS table in the driver allows to spread traffic
more equally across CPUs/RSS channels if number of RSS channels is not
power of 2.
dim [Fri, 30 Dec 2016 09:38:45 +0000 (09:38 +0000)]
MFC r309191 (by rakuco):
fdt: Expect strchr() to return a const char*
In C, strchr(3) returns a char*, whereas C++ defines two overloads:
* const char *strchr(const char*, int)
* char *strchr(char*, int)
Building fdt.cc (with the WITHOUT_GPL_DTC knob set) with libc++ 3.9.0 (imported
in r309124) was failing because libc++ r260377 added the first overload to
string.h, leading to failures such as:
fdt.cc:1638:8: error: cannot initialize a variable of type 'char *' with an
rvalue of type 'const char *'
Just define val as a const char* to fix it.
Upstreamed in https://github.com/davidchisnall/dtc/pull/14
tuexen [Thu, 29 Dec 2016 11:25:41 +0000 (11:25 +0000)]
MFC r310547:
Remove a KASSERT which is not always true.
In case of the empty queue tp->snd_holes and tcp_sackhole_insert()
failing due to memory shortage, tp->snd_holes will be empty.
This problem was hit when stress tests where performed by pho.
emaste [Wed, 28 Dec 2016 21:58:20 +0000 (21:58 +0000)]
Fix EFI self relocation code for rela architectures
MFC r306812 (andrew):
The bootloader self relocation code was slightly wrong for the
R_AARCH64_RELATIVE relocation found on arm64. It would try to add the
contents of the memory location being relocated to the base address and
the relocation addend. This worked when the contents was zero, however
this now seems to be set to the value of the addend so we add this twice.
Fix this by just setting the memory to the computed value.
MFC r309360: EFI loaders: parse rela relocations on amd64
Prior to this change the loader self relocation code interpreted amd64's
rela relocations as if they were rel relocations, discarding the addend.
This "works" because GNU ld 2.17.50 stores the addend value in both the
r_addend field of the relocation (as expected) and at the target of the
relocation.
Other linkers, and possibly other versions of GNU ld, won't have this
behaviour, so interpret the relocations correctly.
jhb [Tue, 27 Dec 2016 20:06:26 +0000 (20:06 +0000)]
MFC 309581,309582,310424: Document T6 support.
309581:
Document support for Terminator 6 adapters in cxgbe(4) and cxgbev(4).
309582:
Bump Dd for addition of T6.
310424:
Replace passive voice with active voice and other tweaks.
- Drop uses of 'will'.
- Replace 'to use' with active voice.
- Tidy language around interrupt types and clarify that INTx doesn't
work on VFs.
- Drop leading articles from sysctl/tunable descriptions.
- Tweak the wording of several sysctl/tunable descriptions.
dim [Mon, 26 Dec 2016 20:36:37 +0000 (20:36 +0000)]
MFC r309124:
Upgrade our copies of clang, llvm, lldb, compiler-rt and libc++ to 3.9.0
release, and add lld 3.9.0. Also completely revamp the build system for
clang, llvm, lldb and their related tools.
Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11
support to build; see UPDATING for more information.
Release notes for llvm, clang and lld are available here:
<http://llvm.org/releases/3.9.0/docs/ReleaseNotes.html>
<http://llvm.org/releases/3.9.0/tools/clang/docs/ReleaseNotes.html>
<http://llvm.org/releases/3.9.0/tools/lld/docs/ReleaseNotes.html>
Thanks to Ed Maste, Bryan Drewery, Andrew Turner, Antoine Brodin and Jan
Beich for their help.
Relnotes: yes
MFC r309147:
Pull in r282174 from upstream llvm trunk (by Krzysztof Parzyszek):
[PPC] Set SP after loading data from stack frame, if no red zone is
present
Follow-up to r280705: Make sure that the SP is only restored after
all data is loaded from the stack frame, if there is no red zone.
This completes the fix for
https://llvm.org/bugs/show_bug.cgi?id=26519.
Pull in r283060 from upstream llvm trunk (by Hal Finkel):
[PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes
soft-float disable all vector instruction sets for both 32-bit and
64-bit modes. This latter part is necessary because the PPC backend
canonicalizes many Altivec vector types to floating-point types, and
so soft-float breaks scalarization support for many operations. Both
for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware
floating-point also disables vector instructions (embedded targets
without hardware floating point support are unlikely to have Altivec,
etc. and operating system kernels desiring not to use floating-point
registers to lower syscall cost are unlikely to want to use vector
registers either). If someone needs this to work, we'll need to
change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is
enabled, hardware floating-point support needs to be expressed as a
positive feature, like the others, and not a negative feature,
because target features cannot have dependencies on the disabling of
some other feature. So +soft-float has now become -hard-float.
Fixes PR26970.
Pull in r283061 from upstream clang trunk (by Hal Finkel):
[PowerPC] Enable soft-float for PPC64, and +soft-float -> -hard-float
Enable soft-float support on PPC64, as the backend now supports it.
Also, the backend now uses -hard-float instead of +soft-float, so set
the target features accordingly.
Fixes PR26970.
Reported by: Mark Millard
PR: 214433
MFC r309212:
Add a few missed clang 3.9.0 files to OptionalObsoleteFiles.
MFC r309262:
Fix packaging for clang, lldb and lld 3.9.0
During the upgrade of clang/llvm etc to 3.9.0 in r309124, the PACKAGE
directive in the usr.bin/clang/*.mk files got dropped accidentally.
Restore it, with a few minor changes and additions:
* Correct license in clang.ucl to NCSA
* Add PACKAGE=clang for clang and most of the "ll" tools
* Put lldb in its own package
* Put lld in its own package
During the bootstrap phase, when building the minimal llvm library on
PowerPC, add lib/Support/Atomic.cpp. This is needed because upstream
llvm revision r271821 disabled the use of std::call_once, which causes
some fallback functions from Atomic.cpp to be used instead.
Reported by: Mark Millard
PR: 214902
MFC r309835:
Tentatively apply https://reviews.llvm.org/D18730 to work around gcc PR
70528 (bogus error: constructor required before non-static data member).
This should fix buildworld with the external gcc package.
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
3.9.1 release.
Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11
support to build; see UPDATING for more information.
Release notes for llvm, clang and lld will be available here:
<http://releases.llvm.org/3.9.1/docs/ReleaseNotes.html>
<http://releases.llvm.org/3.9.1/tools/clang/docs/ReleaseNotes.html>
<http://releases.llvm.org/3.9.1/tools/lld/docs/ReleaseNotes.html>
pfg [Mon, 26 Dec 2016 16:42:38 +0000 (16:42 +0000)]
MFC r310367:
pax(1): Fix a bug with archives smaller than 512 bytes.
The problem here is that the archive is too short (< 512 bytes). The
buffer routines, try to read at least 512 bytes, even when we try to
determine what format file we have, which is wrong.