Alan Cox [Tue, 1 Aug 2017 03:51:26 +0000 (03:51 +0000)]
The blist_meta_* routines that process a subtree take arguments 'radix' and
'skip', which denote, respectively, the largest number of blocks that can be
managed by a subtree of that height, and one less than the number of nodes
in a subtree of that height. This change removes the 'skip' argument from
those functions because 'skip' can be trivially computed from 'radius'.
This change also redefines 'skip' so that it denotes the number of nodes in
the subtree, and so changes loop upper bound tests from '<= skip' to '<
skip' to account for the change.
The 'skip' field is also removed from the blist struct.
The self-test program is changed so that the print command includes the
cursor value in the output.
Submitted by: Doug Moore <dougm@rice.edu>
MFC after: 1 week
Dmitry Chagin [Tue, 1 Aug 2017 03:40:19 +0000 (03:40 +0000)]
Implement proper Linux /dev/fd and /proc/self/fd behavior by adding
Linux specific things to the native fdescfs file system.
Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.
Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.
For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.
For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:
mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd
Ian Lepore [Mon, 31 Jul 2017 22:00:00 +0000 (22:00 +0000)]
Bugfixes and enhancements...
Don't enable the oscillator when it is found to be stopped at init time,
just let the first setting of valid time start it. But still report a dead
battery if it's stopped at init time.
Don't force the chip into 24hr mode, just cope with whatever mode it is
already in.
Schedule the clock_settime() callbacks to align the RTC clock to top of
second when setting it.
Ian Lepore [Mon, 31 Jul 2017 21:53:00 +0000 (21:53 +0000)]
No need to call getnanotime() now that the waiting is done by the central
subr_rtc code, switch from CLOCKF_SETTIME_NO_TS to CLOCKF_SETTIME_NO_ADJ
so that we get fed a timestamp, but it's not adjusted to compensate for
inaccuracy in setting time.
Avoid reading a snapshot block when it is already in the cache.
Update the use of the B_CACHE flag (since the May 1999 commit
that made it the correct test here).
Reported by: Andreas Longwitz <longwitz@incore.de>
Reviewed by: kib
Tested by: Peter Holm
MFC after: 1 week
Mark Johnston [Mon, 31 Jul 2017 18:48:58 +0000 (18:48 +0000)]
Batch v_wire_count decrements in vm_hold_free_pages().
Atomic updates to v_wire_count are a significant source of contention, so
combine multiple updates into one in this easy case. Also remove an old
printf that gets executed if the page is shared-busied, which is a case
that will lead to a panic anyway.
Adrian Chadd [Mon, 31 Jul 2017 17:33:57 +0000 (17:33 +0000)]
[wlanwds] allow for a DWDS AP VAP to be not be the first VAP on a NIC.
The wlanwds code was just creating a clone VAP without specifying the MAC
address to use for said clone VAP. This meant that if an interface
was cloned from an AP interface that wasn't the first created VAP
(which shares the same MAC as the parent physical interface by default)
then the cloned interface would have the wrong MAC and traffic wouldn't work.
Besides chip bugs (ha!) this isn't a requirement.
So, teach wlanwds to:
* look up the link layer address for a given interface (which really should
be a library interface, and will likely quickly become one);
* use this when creating a cloned interface for a DWDS peer;
* (net80211 already has the infrastructure to do this, it just needed to be
used);
* add some extra logging to see what MAC addresses, parent interfaces, etc
are being created.
Whilst here, add a reminder that I should extend this to include monitoring
a specific VAP for DWDS updates rather than just the parent interface.
This is the first step in allowing for multiple DWDS hops, which is a
pre-requisite for adrian's house having wifi in the single upstairs room.
Tested:
* AR9380, DWDS AP + AP mode - with DWDS AP being the second VAP created
with a different MAC address;
* AR9331 (Carambola2), AP + DWDS STA;
* passing traffic
TODO:
* fix 802.11s so this DWDS stuff is no longer required!
Ian Lepore [Mon, 31 Jul 2017 15:24:40 +0000 (15:24 +0000)]
Check the clock-halted flag every time the clock is read, not just once
at startup. The flag stays set until the clock is loaded with good time,
so we need to keep saying the time is invalid until that happens.
Alexander Motin [Mon, 31 Jul 2017 15:23:19 +0000 (15:23 +0000)]
Improve FHA locality control for NFS read/write requests.
This change adds two new tunables, allowing to control serialization for
read and write NFS requests separately. It does not change the default
behavior since there are too many factors to consider, but gives additional
space for further experiments and tuning.
The main motivation for this change is very low write speed in case of ZFS
with sync=always or when NFS clients requests sychronous operation, when
every separate request has to be written/flushed to ZIL, and requests are
processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows
to increase ZIL throughput by several times by coalescing writes and cache
flushes. There is a worry that doing it may increase data fragmentation
on disks, but I suppose it should not happen for pool with SLOG.
Ian Lepore [Mon, 31 Jul 2017 14:57:02 +0000 (14:57 +0000)]
Switch from using iic_transfer() to iicdev_readfrom/writeto(), mostly so
that transfers will be done with proper ownership of the bus. No
behavioral changes. Also add a detach() method.
Andrew Gallatin [Mon, 31 Jul 2017 14:56:35 +0000 (14:56 +0000)]
Don't request CTLTYPE_OPAQUE if we can't print them.
The intent is to skip expensive opaque sysctls like tcp_pcblist unless
they are explicitly requested. Sysctl nodes like this don't show up in
sysctl -a, but they do generate output that winds up being dropped,
unless the user specifically requested binary/hex output or opaques.
This reduces the runtime of sysctl in many circumstances on a loaded
system. It also reduces the likelihood that simply gathering
diagnostics on a sick machine (stuck lock, etc) via sysctl -a might
push it over the edge into a total lockup.
Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
from enc_hhook().
This should solve the problem when pf is used with if_enc(4) interface,
and outbound packet with existing PCB checked by pf, and this leads to
deadlock due to pf does its own PCB lookup and tries to take rlock when
wlock is already held.
Now we pass PCB pointer if it is known to the pfil hook, this helps to
avoid extra PCB lookup and thus rlock acquiring is not needed.
For inbound packets it is safe to pass NULL, because we do not held any
PCB locks yet.
How network VF works with hn(4) on Hyper-V in non-transparent mode:
- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
address.
- Once the network VF is up, e.g. ifconfig VF up:
o All of the transmission should go through the network VF.
o Most of the reception goes through the network VF.
o Small amount of reception may go through the cooresponding hn(4).
This reception will happen, even if the the cooresponding hn(4) is
down. The cooresponding hn(4) will change the reception interface
to the network VF, so that network layer and application layer will
be tricked into thinking that these packets were received by the
network VF.
o The cooresponding hn(4) pretends the physical link is down.
- Once the network VF is down or detached:
o All of the transmission should go through the cooresponding hn(4).
o All of the reception goes through the cooresponding hn(4).
o The cooresponding hn(4) fallbacks to the original physical link
detection logic.
All these features are mainly used to help live migration, during which
the network VF will be detached, while the network communication to the
VM must not be cut off. In order to reach this level of live migration
transparency, we use failover mode lagg(4) with the network VF and the
cooresponding hn(4) attached to it.
To ease user configuration for both network VF and non-network VF, the
lagg(4) will be created by the following rules, and the configuration
of the cooresponding hn(4) will be applied to the lagg(4) automatically.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11635
Ian Lepore [Mon, 31 Jul 2017 01:36:51 +0000 (01:36 +0000)]
Use the new clock_schedule() to arrange for clock_settime() to be called
at the right time to keep the RTC hardware time in sync, instead of using
pause_sbt() to sleep until the right time.
Ian Lepore [Mon, 31 Jul 2017 01:18:21 +0000 (01:18 +0000)]
Add clock_schedule(), a feature that allows realtime clock drivers to
request that their clock_settime() methods be called at a given offset
from top-of-second. This adds a timeout_task to the rtc_instance so that
each clock can be separately added to taskqueue_thread with the scheduling
it prefers, instead of looping through all the clocks at once with a
single task on taskqueue_thread. If a driver doesn't call clock_schedule()
the default is the old behavior: clock_settime() is queued immediately.
The motivation behind this is that I was on the path of adding identical
code to a third RTC driver to figure out a delta to top-of-second and
sleep for that amount of time because writing the the RTC registers resets
the hardware's concept of top-of-second. (Sometimes it's not top-of-second,
some RTC clocks tick over a half second after you set their time registers.)
Worst-case would be to sleep for almost a full second, which is a rude thing
to do on a shared task queue thread.
Ian Lepore [Mon, 31 Jul 2017 00:54:50 +0000 (00:54 +0000)]
Add taskqueue_enqueue_timeout_sbt(), because sometimes you want more control
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.
- Remove 'if_rtwn_load="YES"' line from loader.conf; the module was
renamed in r319733 + it will be loaded automatically as a dependency.
- Move new sentence to new line.
- Add short description for dev.rtwn.%d.rx_buf_size tunable.
Since device can pass multiple frames in a single payload temporary
Rx buffer was big enough to hold all of them; now the driver can
concatenate a single frame from multiple payloads.
The Rx buffer size may be configured via tunable (dev.rtwn.%d.rx_buf_size).
Tested with:
- rtl8188cus, rtl8188eu and rtl8821au (STA mode).
- (by kevlo) rtl8192cu and rtl8188eu.
Scott Long [Sun, 30 Jul 2017 22:34:24 +0000 (22:34 +0000)]
Change from using underbar function names to normal function names for
the informational print functions. Collapse the debug API a bit to be
more generic and not require as much code duplication. While here, fix
a bug in MPS that was already fixed in MPR.
Ian Lepore [Sun, 30 Jul 2017 19:58:31 +0000 (19:58 +0000)]
Fix AM/PM mode handling. The bits to mask off in the hours register changes
between 12/24 hour mode. Also fix conversion between 12 and 24 hour mode.
It's not as easy as adding/subtracting 12, because the clock doesn't roll
over 11->0, it rolls over 12->1; 0 isn't a valid hour in AM/PM mode.
Ian Lepore [Sun, 30 Jul 2017 18:46:38 +0000 (18:46 +0000)]
Bugfixes and enhancements...
Don't enable the oscillator when it is found to be stopped at init time,
just let the first setting of valid time start it. But still report a dead
battery if it's stopped at init time.
Don't force the chip into 24hr mode, just cope with whatever mode it is
already in.
Align the RTC clock to top of second when setting it.
Ian Lepore [Sun, 30 Jul 2017 16:17:06 +0000 (16:17 +0000)]
Switch from using iic_transfer() to iicdev_readfrom/writeto(), mostly so
that transfers will be done with proper ownership of the bus. No
behavioral changes.
Alexander Motin [Sun, 30 Jul 2017 15:19:07 +0000 (15:19 +0000)]
Attach ichwd(4) only to ISA bus of the LPC bridge.
Resource allocation for parent device does not look good by itself, but
attempt to allocate them for unrelated device just does not end up good.
On Asus X99-E WS/USB3.1 system reporting ISA bridge via both PCI and ACPI
this reported to cause kernel panic on shutdown due to messed resources:
https://bugs.freenas.org/issues/25237.
Pull in r309503 from upstream clang trunk (by Richard Smith):
PR33902: Invalidate line number cache when adding more text to
existing buffer.
This led to crashes as the line number cache would report a bogus
line number for a line of code, and we'd try to find a nonexistent
column within the line when printing diagnostics.
This fixes an assertion when building the graphics/champlain port.
Scott Long [Sun, 30 Jul 2017 06:53:58 +0000 (06:53 +0000)]
Split the interrupt setup code into two parts: allocation and configuration.
Do the allocation before requesting the IOCFacts message. This triggers
the LSI firmware to recognize the multiqueue should be enabled if available.
Multiqueue isn't used by the driver yet, but this also fixes a problem with
the cached IOCFacts not matching latter checks, leading to potential problems
with error recovery.
As a side-effect, fetch the driver tunables as early as possible.
Ian Lepore [Sat, 29 Jul 2017 23:45:57 +0000 (23:45 +0000)]
Replace the pcf8563 i2c RTC driver with a new nxprtc driver which handles
all the chips in the NXP PCA212x and PCA/PCF85xx series. In addition to
supporting more chips, this driver uses the countdown timer on the chips as
a fractional seconds counter, giving it a resolution of about 15 milliseconds.