Kenneth D. Merry [Thu, 15 Aug 2013 22:52:39 +0000 (22:52 +0000)]
Change the way that unmapped I/O capability is advertised.
The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver. The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't. The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot. The cdevsw
is shared among all driver instances.
So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.
sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it
with an SI_UNMAPPED cdev flag.
kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine
whether or not a particular driver can handle
unmapped I/O.
geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs.
Since GEOM will create a temporary mapping when
needed, setting SI_UNMAPPED unconditionally will
work.
Remove the D_UNMAPPED_IO flag.
nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here
if NVME_UNMAPPED_BIO_SUPPORT is enabled.
vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a
cdev instead of the D_UNMAPPED_IO flag on the cdevsw.
sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from
setting the D_UNMAPPED_IO flag in the cdevsw to setting
SI_UNMAPPED in the cdev.
Colin Percival [Thu, 15 Aug 2013 20:19:17 +0000 (20:19 +0000)]
Change the queue of locks in kern_rangelock.c from holding lock requests in
the order that they arrive, to holding
(a) granted write lock requests, followed by
(b) granted read lock requests, followed by
(c) ungranted requests, in order of arrival.
This changes the stopping condition for iterating through granted locks to
see if a new request can be granted: When considering a read lock request,
we can stop iterating as soon as we see a read lock request, since anything
after that point is either a granted read lock request or a request which
has not yet been granted. (For write lock requests, we must still compare
against all granted lock requests.)
For workloads with R parallel reads and W parallel writes, this improves
the time spent from O((R+W)^2) to O(W*(R+W)); i.e., heavy parallel-read
workloads become significantly more scalable.
No statistically significant change in buildworld time has been measured,
but synthetic tests of parallel 'dd > /dev/null' and 'openssl enc >/dev/null'
with the input file cached yield dramatic (up to 10x) improvement with high
(up to 128 processes) levels of parallelism.
Kenneth D. Merry [Thu, 15 Aug 2013 16:41:27 +0000 (16:41 +0000)]
Export the maxio field in the CAM XPT_PATH_INQ CCB in the isp(4)
driver.
This tells consumers up the stack the maximum I/O size that the
controller can handle.
The I/O size is bounded by the number of scatter/gather segments
the controller can handle and the page size. For an amd64 system,
it works out to around 5MB.
Attilio Rao [Thu, 15 Aug 2013 11:01:25 +0000 (11:01 +0000)]
On the recovery path for vm_page_alloc(), if a page had been requested
wired, unwind back the wiring bits otherwise we can end up freeing a
page that is considered wired.
Mark Johnston [Thu, 15 Aug 2013 04:08:55 +0000 (04:08 +0000)]
Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.
Michael Tuexen [Wed, 14 Aug 2013 21:51:32 +0000 (21:51 +0000)]
Don't send uninitialized memory (two instances of 4 bytes) in
every cookie on the wire. This bug was reported in
https://bugzilla.mozilla.org/show_bug.cgi?id=905080
Rick Macklem [Wed, 14 Aug 2013 21:11:26 +0000 (21:11 +0000)]
Fix several performance related issues in the new NFS server's
DRC for NFS over TCP.
- Increase the size of the hash tables.
- Create a separate mutex for each hash list of the TCP hash table.
- Single thread the code that deletes stale cache entries.
- Add a tunable called vfs.nfsd.tcphighwater, which can be increased
to allow the cache to grow larger, avoiding the overhead of frequent
scans to delete stale cache entries.
(The default value will result in frequent scans to delete stale cache
entries, analagous to what the pre-patched code does.)
- Add a tunable called vfs.nfsd.cachetcp that can be used to disable
DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP.
It also adjusts the size of nfsrc_floodlevel dynamically, so that it is
always greater than vfs.nfsd.tcphighwater.
For UDP the algorithm remains the same as the pre-patched code, but the
tunable vfs.nfsd.udphighwater can be used to allow the cache to grow
larger and reduce the overhead caused by frequent scans for stale entries.
UDP also uses a larger hash table size than the pre-patched code.
Reported by: wollman
Tested by: wollman (earlier version of patch)
Submitted by: ivoras (earlier patch)
Reviewed by: jhb (earlier version of patch)
MFC after: 1 month
Jeff Roberson [Tue, 13 Aug 2013 22:40:43 +0000 (22:40 +0000)]
- Add a statically allocated memguard arena since it is needed very early
on.
- Pass the appropriate flags to vmem_xalloc() when allocating space for
the arena from kmem_arena.
Jeff Roberson [Tue, 13 Aug 2013 21:56:16 +0000 (21:56 +0000)]
Improve pageout flow control to wakeup more frequently and do less work while
maintaining better LRU of active pages.
- Change v_free_target to include the quantity previously represented by
v_cache_min so we don't need to add them together everywhere we use them.
- Add a pageout_wakeup_thresh that sets the free page count trigger for
waking the page daemon. Set this 10% above v_free_min so we wakeup before
any phase transitions in vm users.
- Adjust down v_free_target now that we're willing to accept more pagedaemon
wakeups. This means we process fewer pages in one iteration as well,
leading to shorter lock hold times and less overall disruption.
- Eliminate vm_pageout_page_stats(). This was a minor variation on the
PQ_ACTIVE segment of the normal pageout daemon. Instead we now process
1 / vm_pageout_update_period pages every second. This causes us to visit
the whole active list every 60 seconds. Previously we would only maintain
the active LRU when we were short on pages which would mean it could be
woefully out of date.
Peter Wemm [Tue, 13 Aug 2013 20:38:55 +0000 (20:38 +0000)]
vfork(2) was listed as deprecated in 1994 (r1573) and was the false
reports of its impending demise were removed in 2009 (r199257).
However, in 1996 (r16117) system(3) was switched from vfork(2) to
fork(2) based partly on this. Switch back to vfork(2). This has a
dramatic effect in cases of extreme mmap use - such as excessive
abuse (500+) of shared libraries.
popen(3) has used vfork(2) for a while. vfork(2) isn't going anywhere.
John Baldwin [Tue, 13 Aug 2013 18:45:58 +0000 (18:45 +0000)]
Some small cleanups to the fixes in r180340:
- Set NOTE_TRACKERR before running filt_proc(). If the knote did not
have NOTE_FORK set in fflags when registered, then the TRACKERR event
could miss being posted.
- Don't pass the pid in to filt_proc() for NOTE_FORK events. The special
handling for pids is done knote_fork() directly and no longer in
filt_proc().
Pedro F. Giffuni [Tue, 13 Aug 2013 15:40:43 +0000 (15:40 +0000)]
Define ext2fs local types and use them.
Add definitions for e2fs_daddr_t, e4fs_daddr_t in addition
to the already existing e2fs_lbn_t and adjust them for ext4.
Other than making the code more readable these changes should
fix problems related to big filesystems.
Setting the proper types can be tricky so the process was
helped by looking at UFS. In our implementation, logical block
numbers can be negative and the code depends on it. In ext2,
block numbers are unsigned so it is convenient to keep
e2fs_daddr_t unsigned and use the complete 32 bits. In the
case of e4fs_daddr_t, while the value should be unsigned, for
ext4 we only need to support 48 bits so preserving an extra
bit from the sign is not an issue.
While here also drop the ext2_setblock() prototype that was
never used.
Adrian Chadd [Tue, 13 Aug 2013 09:58:27 +0000 (09:58 +0000)]
ieee80211_rate2plcp() and ieee80211_rate2phytype() are both pre-11n
routines and thus assert if one passes in a rate code with the
high bit set.
Since the high bit can indicate either IEEE80211_RATE_BASIC or
IEEE80211_RATE_MCS, it's up to the caller to determine whether
the rate is 11n or not, and either mask out the BASIC bit, or
call a different function.
(Yes, this does mean that net80211 should grow 11n-aware rate2phytype()
and rate2plcp() functions..)
This may need to happen for the other drivers - it's currently only
done (now) for iwn(4) and bwi(4).
Peter Wemm [Tue, 13 Aug 2013 07:15:01 +0000 (07:15 +0000)]
The iconv in libc did two things - implement the standard APIs, the GNU
extensions and also tried to be link time compatible with ports libiconv.
This splits that functionality and enables the parts that shouldn't
interfere with the port by default.
WITH_ICONV (now on by default) - adds iconv.h, iconv_open(3) etc.
WITH_LIBICONV_COMPAT (off by default) adds the libiconv_open etc API, linker
symbols and even a stub libiconv.so.3 that are good enough to be able
to 'pkg delete -f libiconv' on a running system and reasonably expect it
to work.
I have tortured many machines over the last few days to try and reduce
the possibilities of foot-shooting as much as I can. I've successfully
recompiled to enable and disable the libiconv_compat modes, ports that use
libiconv alongside system iconv etc. If you don't enable the
WITH_LIBICONV_COMPAT switch, they don't share symbol space.
This is an extension of behavior on other system. iconv(3) is a standard
libc interface and libiconv port expects to be able to run alongside it on
systems that have it.
Mark Johnston [Tue, 13 Aug 2013 03:10:39 +0000 (03:10 +0000)]
FreeBSD's DTrace implementation has a few problems with respect to handling
probes declared in a kernel module when that module is unloaded. In
particular,
* Unloading a module with active SDT probes will cause a panic. [1]
* A module's (FBT/SDT) probes aren't destroyed when the module is unloaded;
trying to use them after the fact will generally cause a panic.
This change fixes both problems by porting the DTrace module load/unload
handlers from illumos and registering them with the corresponding
EVENTHANDLER(9) handlers. This allows the DTrace framework to destroy all
probes defined in a module when that module is unloaded, and to prevent a
module unload from proceeding if some of its probes are active. The latter
problem has already been fixed for FBT probes by checking lf->nenabled in
kern_kldunload(), but moving the check into the DTrace framework generalizes
it to all kernel providers and also fixes a race in the current
implementation (since a probe may be activated between the check and the
call to linker_file_unload()).
Additionally, the SDT implementation has been reworked to define SDT
providers/probes/argtypes in linker sets rather than using SYSINIT/SYSUNINIT
to create and destroy SDT probes when a module is loaded or unloaded. This
simplifies things quite a bit since it means that pretty much all of the SDT
code can live in sdt.ko, and since it becomes easier to integrate SDT with
the DTrace framework. Furthermore, this allows FreeBSD to be quite flexible
in that SDT providers spanning multiple modules can be created on the fly
when a module is loaded; at the moment it looks like illumos' SDT
implementation requires all SDT probes to be statically defined in a single
kernel table.
Mark Johnston [Tue, 13 Aug 2013 03:09:00 +0000 (03:09 +0000)]
Remove some unused fields from struct linker_file. They were added in
r172862 for use by the DTrace SDT framework but don't seem to have ever
been used.
Mark Johnston [Tue, 13 Aug 2013 03:07:49 +0000 (03:07 +0000)]
Add event handlers for module load and unload events. The load handlers are
called after the module has been loaded, and the unload handlers are called
before the module is unloaded. Moreover, the module unload handlers may
return an error to prevent the unload from proceeding.
Glen Barber [Tue, 13 Aug 2013 02:31:46 +0000 (02:31 +0000)]
Make sure bootonly.iso for -BETAs and -RCs use the releases/
directory on the FTP mirrors to fetch distributions, since
these are always pushed to releases/ during the release cycle.
Jack F Vogel [Tue, 13 Aug 2013 00:25:39 +0000 (00:25 +0000)]
Alter the mq_start routine to do a TRYLOCK and call to the locked routine
rather than just queueing. The former code was an attempt at getting
UDP performance up, but there have been customer reports of problems with it,
so the ixgbe approach seems the best solution for now.
Scott Long [Mon, 12 Aug 2013 23:30:01 +0000 (23:30 +0000)]
Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCI
command register. The lazy BAR allocation code in FreeBSD sometimes
disables this bit when it detects a range conflict, and will re-enable
it on demand when a driver allocates the BAR. Thus, the bit is no longer
a reliable indication of capability, and should not be checked. This
results in the elimination of a lot of code from drivers, and also gives
the opportunity to simplify a lot of drivers to use a helper API to set
the busmaster enable bit.
This changes fixes some recent reports of disk controllers and their
associated drives/enclosures disappearing during boot.
Jack F Vogel [Mon, 12 Aug 2013 22:54:38 +0000 (22:54 +0000)]
Improve the MSIX setup code in the drivers, thanks to Marius for
the changes. Make sure that pci_alloc_msix() does give us the vectors
we need and fall back to MSI when it doesn't, also release any that
were allocated when insufficient.
Pedro F. Giffuni [Mon, 12 Aug 2013 21:34:48 +0000 (21:34 +0000)]
Add read-only support for extents in ext2fs.
Basic support for extents was implemented by Zheng Liu as part
of his Google Summer of Code in 2010. This support is read-only
at this time.
In addition to extents we also support the huge_file extension
for read-only purposes. This works nicely with the additional
support for birthtime/nanosec timestamps and dir_index that
have been added lately.
The implementation may not work for all ext4 filesystems as
it doesn't support some features that are being enabled by
default on recent linux like flex_bg. Nevertheless, the feature
should be very useful for migration or simple access in
filesystems that have been converted from ext2/3 or don't use
incompatible features.
Special thanks to Zheng Liu for his dedication and continued
work to support ext2 in FreeBSD.
Submitted by: Zheng Liu (lz@)
Reviewed by: Mike Ma, Christoph Mallon (previous version)
Sponsored by: Google Inc.
MFC after: 3 weeks
Make check for unknown login class actually work. Previously, using the "-c" option
with login class not defined in login.conf(5) would silently fail, resulting in using
the default login class.
Temporarily revert sendmail 8.14.7 change to getipnodebyname() flags to
prevent problems between the resolver and Microsoft DNS servers with
AAAA lookups. The upstream open source project will work on a more
permanent fix for the next release. Issue noted by Pavel Timofeev.
Michael Tuexen [Mon, 12 Aug 2013 13:52:15 +0000 (13:52 +0000)]
Make the features a 64-bit value instead of 32-bit.
This will allow an easier integration of the support
for NDATA.
While there, do also some minor cleanups.
Obtained from: rrs@
MFC after: 2 weeks
Peter Wemm [Mon, 12 Aug 2013 09:56:52 +0000 (09:56 +0000)]
Give up on using iconv to convert to UTF-8 at build time. I don't see any
practical way to make iconv(1) as a build tool. Instead pre-convert.
This gives us UTF-8 nvi catalogs even on systems without iconv enabled.
Devin Teske [Mon, 12 Aug 2013 03:52:23 +0000 (03:52 +0000)]
Add optional support for default override of standard setup; but only if
corresponding functions are provided. If override function does not exist,
boot remains unmodified. This patch should not result in any changes.