Add FPU_KERN_NOCTX flag to the fpu_kern_enter() function on amd64.
The flag specifies that the block which uses FPU must be executed in
critical section, i.e. take no context switches, and does not need an
FPU save area during the execution.
It is intended to be applied around fast and short code pathes where
save area allocation is impossible or undesirable, due to context or
due to the relative cost of calculation vs. allocation.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
manu [Sat, 10 Sep 2016 17:45:35 +0000 (17:45 +0000)]
a10_mmc: Remove completly the PIO code now all access is done by DMA.
Rename registers as in the manual.
Do a hard reset of the controller before a soft one.
Since DMA is always used remove dependancy on allwinner_soc_family, it was used
to differentiate SoC as the fdt compatible string were the same.
Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is. Eliminate the archaic
"XXX" comment about its value. I don't believe that its exact value, e.g.,
5 versus 6, matters.
Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.
On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.
cache: improve scalability by introducing bucket locks
An array of bucket locks is added.
All modifications still require the global cache_lock to be held for
writing. However, most readers only need the relevant bucket lock and in
effect can run concurrently to the writer as long as they use a
different lock. See the added comment for more details.
This is an intermediate step towards removal of the global lock.
Switch random_get_pseudo_bytes() shim to arc4rand().
Our shim for Solaris random_get_bytes() uses read_random(), that looks
reasonable, since it guaranties reliably seeded random data. On the other
side Solaris random_get_pseudo_bytes() does not provide this guarantie,
and its original Solaris implementation is equivalent to our arc4rand(),
using software crypto without stressing slower hardware RNG.
Pass the trap type and code down from db_trap() to db_stop_at_pc() so
that the latter can easily determine what the trap type actually is
after callers are fixed to encode the type unambigously.
ddb currently barely understands breakpoints, and it treats all
non-breakpoints as single-step traps. This works OK for stopping
after every instruction when single-stepping, but is broken for
single-stepping with a count > 1 (especially with a large count).
ddb needs to stop on the first non-single-step trap while single-
stepping. Otherwise, ddb doesn't even stop the first time for
fatal traps and external breakpoints like the one in kdb_enter().
Fix stopping when the specified breakpoint count is reached. The
countdown was done correctly, but the action when the count was not
reduced to 0 was to fall through to generic code which almost always
stopped.
Give the full syntax of the 'count' arg for all commmands that support
it. This arg is most interesting for the 'break' command where it
never worked, and for the step command where it is powerful but too
fragile to use much.
Give the full syntax of the 'addr' arg for these commands and some
others. Rename it from 'address' for the generic command.
Fix description of how 'count' is supposed to work for the 'break'
command.
Don't (mis)describe the syntax of the comma for the 'step' command.
Expand the description for the generic command.
Give the full syntax for the 'examine' command. It was also missing
the possible values for the modifier.
Fix mdoc syntax error for the 'search' command.
Remove FUD about consequences of not having a trap handler for the
'search' command.
Properly patch up dirname()/basename() calls to not clobber ent->log.
It turns out that we had a couple of more calls to dirname()/basename()
in newsyslog(8) that assume the input isn't clobbered. This is bad,
because it apparently breaks log rotation now that the new dirname()
implementation has been merged.
Fix this by first copying the input and then calling
dirname()/basename(). While there, improve the naming of variables in
this function a bit.
adrian [Fri, 9 Sep 2016 04:45:25 +0000 (04:45 +0000)]
[ath_hal] fixes for finer grain timestamping, some 11n macros
* change the HT_RC_2_MCS to do MCS0..23
* Use it when looking up the ht20/ht40 array for bits-per-symbol
* add a clk_to_psec (picoseconds) routine, so we can get sub-microsecond
accuracy for the math
* .. and make that + clk_to_usec public, so higher layer code that is
returning clocks (eg the ANI diag routines, some upcoming locationing
experiments) can be converted to microseconds.
Whilst here, add a comment in ar5416 so i or someone else can revisit the
latency values.
On big endian hardware that uses 1 byte bool a type mismatch of bool vs int will
cause the least signifcant byte of db_cmd_loop_done to be set, but the MSB to be
read, and read as 0. This causes ddb to stay in an infinite loop.
Take advantage of new bmake feature to only consider Makefile.depend
as invalidating DIRDEPS_CACHE.
When bootstrapping allow more filtering via .MAKE.DEPENDFILE_BOOTSTRAP_SED
Move some comments back to where they make sense.
meta.sys.mk: add META_COOKIE_TOUCH and META_NOPHONY to better handle some
targets in meta mode vs non-meta mode.
Also use .MAKE.META.IGNORE_PATHS to ignore mtime of makefiles - which do
not matter in meta mode.
The initial value of NOASM is nearly the same in all cases and the
initial value of PSEUDO is the same in all cases so reduce duplication
(and hopefully, future merge conflicts) by machine independent defaults.
Document PCI_HP and PCI_IOV kernel options and various tunables in pci(4).
Describe PCI-related kernel options for HotPlug and SR-IOV support in the
pci(4) manual page. While here, add a section describing the various
tunables supported by the PCI bus driver as well.
Reviewed by: wblock
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7754
Sprinkle DOINGASYNC() checks so as to do delayed writes for async
mounts in almost all cases instead of in most cases. Don't override
DOINGASYNC() by any condition except IO_SYNC.
Fix previous sprinking of DOINGASYNC() checks. Don't override IO_SYNC
by DOINGASYNC(). In ffs_write() and ffs_extwrite(), there were
intentional overrides that just broke O_SYNC of data. In
ffs_truncate(), there are 5 calls to ffs_update(), 4 with
apparently-unintentional overrides and 1 without; this had no effect
due to the main async mount hack descibed below.
Fix 1 place in ffs_truncate() where the caller's IO_ASYNC was overridden
for the soft updates case too (to do a delayed write instead of a sync
write). This is supposed to be the only change that affects anything
except async mounts.
In ffs_update(), remove the 19 year old efficiency hack of ignoring
the waitfor flag for async mounts, so that fsync() almost works for
async mounts. All callers are supposed to be fixed to not ask for a
sync update unless they are for fsync() or [I]O_SYNC operations.
fsync() now almost works for async mounts. It used to sync the data
but not the most important metdata (the inode). It still doesn't sync
associated directories.
This gave 10-20% fewer writes for my makeworld benchmark with async
mounted tmp and obj directories from an already small number.
Style fixes:
- in ffs_balloc.c, remove rotted quadruplicated comments about the
simplest part of the DOING*() decisions and rearrange the nearly-
quadruplicated code to be more nearly so.
- in ufs_vnops.c, use a consistent style with less negative logic and
no manual "optimization" of || to | in DOING*() expressions.
etcupdate: preserve the metadata of the destination file
When using diff3 to perform a three-way merge, etcupdate lost the destination
file's metadata. The metadata from the temporary file were used instead.
This was unpleasant for rc.d scripts, which require execute permission.
Use "cat >" to overwrite the destination file's contents while preserving its
metadata.
Fix single-stepping of instructions emulated by vm86.
In vm86.c, fix 2 (rarely used) cases where the return code lost the
single-step indicator. While here, fix 2 misspellings of PSL_T as
PSL_TF (TF is the CPU manufacturer's spelling, but we use T).
In trap.c, turn T_PROTFLT and T_STKFLT into T_TRCTRAP if
vm86_emulate() asked for this (it does this when the instruction is
being traced and was successully emulated). In the kernel case, we
used to deliver the trap as SIGTRAP to the process, where it always
terminated the process; now we deliver the trap as T_TRCTRAP to kdb,
where it normally gives single-stepping. In the user case, the only
difference is that we now clear PSL_T and initialize ucode properly.
On rename, do not perform truncation of dirhash if the vnode
truncation failed.
Doing so resulted in inconsistent state of the ufs dirhash with regard
to the actual directory inode state, and could lead to spurious ENOENT
errors for lookups of existing files in production kernels, or
assertion failures in the debugging kernels.
Change the logic of calling ufsdirhash_dirtrunc() to be same as in
ufs_direnter(). Execute UFS_TRUNCATE() first, log error, and only do
dirtrunc() if UFS_TRUNCATE() succeeded.
Note that the problem was exacerbated by the bug in the
flush_newblk_dep() function (see r305599), which caused in the spurios
errors from ffs_sync() and then ffs_truncate().
In collaboration with: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Do not leak transient ENOLCK error from flush_newblk_dep() loop.
The buffer lock is retried on failed LK_SLEEPFAIL attempt, and error
from the failed attempt is irrelevant. But since there is path after
retry which does not clear error, it is possible to return spurious
error from the function.
The issue resulted in a spurious failure of softdep_sync_buf(),
causing further spurious failure of ffs_sync().
In collaboration with: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
When externding directory inode in ufs_direnter(), adjust i_endoff.
This change is formally not needed, since i_endoff not used in all
code paths after the call to ufs_direnter(), and i_endoff is
recalculated by the next lookup. But having the value correct makes
the reasoning about code simpler.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
In dqsync(), when called from quotactl(), um_quotas entry might appear
cleared since nothing prevents completion of the parallel quotaoff.
There is nothing to sync in this case, and no reason to panic.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
There is no need to upgrade the last dvp lock on lookups for modifying
operations. Instead of upgrading, assert that the lock is exclusive.
Explain the cause in comments.
This effectively reverts r209367.
Tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Partially lift suspension when ffs_reload() finished with cgs and
going to re-read inodes.
Secondary write initiators, e.g. ufs_inactive(), might need to start a
write while owning the vnode lock. Since the suspended state
established by /dev/ufssuspend prevents them from entering
vn_start_secondary_write(), we get deadlock otherwise.
Note that it is arguably not very useful to re-read inodes after
/dev/ufssuspend suspension, because the suspension does not block
readers, and other threads might read existing files in parallel with
suspension owner (for now, only growfs(8)) operations. This
effectively means that suspension owner cannot safely modify existing
inodes, and then there is no sense in re-reading. But keep the code
enabled for now.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The GUID string provided by hypervisor has leading and trailing braces,
while our GUID string does not have braces at all. Both braces should
be ignored, when the GUID strings are compared.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Modified by: sephe
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7809
In m_devget(), if the data fits in a packet header mbuf, check the amount
of data is less than or equal to MHLEN instead of MLEN when placing initial
small packet header at end of mbuf.
Suffix short month names with "월" and replace %b with %_m for date formats.
This change is analogous to r199179, r199271, and r289041 for japanese and
chinese locales.
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
This patch simply removes this macro from dsl_dataset.h.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFV r305560: 7278 tuning zfs_arc_max does not impact arc_c_min
When changing zfs_arc_max (e.g. as zdb does), it may be set to less
than the default arc_c_min. arc_c_min should decrease to not be more than
arc_c_max, but it doesn't; therefore tuning of arc_c_max is ineffective.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Author: Matthew Ahrens <mahrens@delphix.com>
The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters. The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.
Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device. It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.
t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.
t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.
VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages). This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request. In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices. Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.
Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.
Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics. In addition, TOE is not supported on VF devices, only for
the PF interfaces.
Don't break out of the m_advance() loop if len drops to zero.
If a packet contains the Ethernet header (14 bytes) in the first mbuf
and the payload (IP + UDP + data) in the second mbuf, then the attempt
to fetch the l3hdr will return a NULL pointer. The first loop iteration
will drop len to zero and exit the loop without setting 'p'. However,
the desired data is at the start of the second mbuf, so the correct
behavior is to loop around and let the conditional set 'p' to m_data of
the next mbuf (and leave offset as 0).
andrew [Wed, 7 Sep 2016 16:46:54 +0000 (16:46 +0000)]
When synchronising the instruction and data caches we only need to clean
the data cache to the point of unification. This is the point where the
two caches are unified to a single unified cache so cleaning past here
is just extra unneeded work.
This was noticed when investigating r305545.
Reported by: bz
Obtained from: ABT Systems Ltd
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
andrew [Wed, 7 Sep 2016 16:22:05 +0000 (16:22 +0000)]
Only call cpu_icache_sync_range when inserting an executable page. If the
page is non-executable the contents of the i-cache are unimportant so this
call is just adding unneeded overhead when inserting pages.
While doing research using gem5 with an O3 pipeline and 1k/32k/1M iTLB/L1
iCache/L2 Bjoern Zeeb (bz@) observed a fairly high rate of calls into
arm64_icache_sync_range() from pmap_enter() along with a high number of
instruction fetches and iTLB/iCache hits.
Limiting the calls to arm64_icache_sync_range() to only executable pages,
we observe the iTLB and iCache Hit going down by about 43%. These numbers
are quite misleading when looked at alone as at the same time instructions
retired were reduced by 19.2% and instruction fetches were reduced by 38.8%.
Overall this reduced the runtime of the test program by 22.4%.
On Juno hardware, in steady-state, running the same test, using the cycle
count to determine runtime, we do see a reduction of up to 28.9% in runtime.
While these numbers certainly depend on the program executed, we expect an
overall performance improvement.
Reported by: bz
Obtained from: ABT Systems Ltd
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Due to reading initialized variable, FIS receive area was always allocated
as 256 bytes, suitable for command-based switching, instead of 4096 bytes,
required for FIS-based switching. This caused memory corruption in case of
port multipliers used on FBS-capable HBAs (Marvell).
Fix MIPS INTRNG (both FDT and non-FDT) behaviour broken by r304459
More changes to MIPS may be required, as commented in D7692, but this
revision aims to restore MIPS INTRNG functionality so we can move on
with working interrupts.
Reported by: yamori813@yahoo.co.jp
Tested by: mizhka (on BCM), sgalabov (on Mediatek)
Reviewed by: adrian, nwhitehorn (older version)
Sponsored by: Smartcom - Bulgaria AD
Differential Revision: https://reviews.freebsd.org/D7692