jeff [Mon, 31 Mar 2008 07:55:45 +0000 (07:55 +0000)]
- Since rev 1.142 of ffs_snapshot.c the interlock has not been required
to protect the v_lock pointer. Removing the interlock acquisition
here allows vn_lock() to proceed without requiring the interlock
at all.
- If the lock mutated while we were sleeping on it the interlock has
been dropped. It is conceivable that the upper layer code was
relying on the interlock and LK_NOWAIT to protect the identity or
state of the vnode while acquiring the lock. In this case return
EBUSY rather than trying the new lock to prevent potential races.
jeff [Mon, 31 Mar 2008 07:47:08 +0000 (07:47 +0000)]
- Don't free snapdata structures when they are no longer in use.
Keeping the lockmgr lock valid allows us to switch the v_lock pointer
in snapshot vnodes between the embedded lockmgr lock and snapdata
lock without needing the vnode interlock to protect against races
- Keep unused snapdata structures in a list.
- Add a function to lock the devvp and allocate a snapdata to it or
acquire a new one without races. The old function was safe from
creation races because we set the mount flag when creating snapshots
and thus serializing them. However, it might have been subject to
destroying races.
marcel [Sun, 30 Mar 2008 23:09:14 +0000 (23:09 +0000)]
Better implement I-cache invalidation. The previous implementation
was a kluge. This implementation matches the behaviour on powerpc
and sparc64.
While on the subject, make sure to invalidate the I-cache after
loading a kernel module.
bde [Sun, 30 Mar 2008 18:07:12 +0000 (18:07 +0000)]
Use fabs[f]() instead of bit fiddling for setting absolute values.
This makes little difference in float precision, but in double
precision gives a speedup of about 30% on amd64 (A64 CPU) and i386
(A64). This depends on fabs[f]() being inline and efficient. The
bit fiddling (or any use of SET_HIGH_WORD(), which libm does too
much because it was best on old 32-bit machines) always causes
packing overheads and sometimes causes stalls in the packing, since
it operates on only part of a variable in the double precision case.
It apparently did cause stalls in a critical path here.
bde [Sun, 30 Mar 2008 17:28:27 +0000 (17:28 +0000)]
Use the expression fabs(x+0.0)-fabs(y+0.0) instead of
fabs(x+0.0)+fabs(y+0.0) when mixing NaNs. This improves
consistency of the result by making it harder for the compiler to reorder
the operands. (FP addition is not necessarily commutative because the
order of operands makes a difference on some machines iff the operands are
both NaNs.)
bde [Sun, 30 Mar 2008 17:17:42 +0000 (17:17 +0000)]
Fix a missing mask in a hi+lo decomposition. Thus bug made the extra
precision in software useless, so hypotf() had some errors in the 1-2
ulp range unless there is extra precision in hardware (as happens on
i386).
jeff [Sun, 30 Mar 2008 11:31:14 +0000 (11:31 +0000)]
- Consistently return EDEADLK when presented with a new set that is
incompatible with existing bindings.
- Try to copyout the setid in cpuset() before migrating the proc to the
setid in case the user has supplied a bad buffer.
- Rename cpuset_root() and cpuset_base() to cpuset_ref{root,base} to
be more descriptive and free cpuset_root to be used as a different
type of symbol.
- Make cpuset_root the cpuset_t set of all cpus in the system. This
should contain the same bitmask as all_cpus presently.
- Add a CPU_CMP() macro to compare two sets.
dfr [Sun, 30 Mar 2008 09:36:17 +0000 (09:36 +0000)]
Don't call xdrrec_skiprecord in the non-blocking case. If
__xdrrec_getrec has returned TRUE, then we have a complete request in
the buffer - calling xdrrec_skiprecord is not necessary. In particular,
if there is another record already buffered on the stream,
xdrrec_skiprecord will discard both this request and the next
one, causing the call to xdr_callmsg to fail and the stream to be
closed.
mav [Sun, 30 Mar 2008 07:53:51 +0000 (07:53 +0000)]
- Account all node stats at the shape mode.
- Do not check destination hook presence, it will be done by netgraph.
- Use u_int instead of int in some places to simplify type conversions.
- Use NG_SEND_DATA_ONLY() macro instead of selfmade equivalent.
brooks [Sun, 30 Mar 2008 02:42:39 +0000 (02:42 +0000)]
Add a new function is_default_interface() which determines if this
interface is one with the default route (or there isn't one). Use it to
decide if we should adjust the default route and /etc/resolv.conf.
Fix the delete of the default route. The if statement was totally bogus
and the delete only worked due to a typo. [1]
Reported by: Jordan Coleman <jordan at JordanColeman dot com> [1]
MFC after: 1 week
jeff [Sat, 29 Mar 2008 23:36:26 +0000 (23:36 +0000)]
- Don't allow calls to vn_lock() with no lock type requested. Callers
which simply want a reference should use vref(). Callers which want
to check validity need to hold a lock while performing any action
based on that validity. vn_lock() would always release the interlock
before returning making any action synchronous with the validity check
impossible.
jeff [Sat, 29 Mar 2008 23:24:54 +0000 (23:24 +0000)]
- Simplify null_hashget() and null_hashins() by using vref() rather
than a complex series of steps involving vget() without a lock type
to emulate the same thing.
jhb [Sat, 29 Mar 2008 17:46:03 +0000 (17:46 +0000)]
Change kgdb_parse() to use wrapped versions of parse_expression() and
evaluate_expression() so that any errors are caught and cause the function
to return to 0. Otherwise the errors posted an exception (via longjmp())
that aborted the current operation. This fixes the kld handling for
older kernels (6.x and 7.x) that don't have the full pathname stored in
the kernel linker.
marcel [Sat, 29 Mar 2008 17:33:29 +0000 (17:33 +0000)]
Change the order from SI_ORDER_FIRST to SI_ORDER_ANY (within
SI_SUB_DRIVERS) to avoid loading schemes before all the GEOM
classes have been loaded and initialized. Otherwise we may
end up using mutexes that haven't been initialized (due to
g_retaste() posting an event).
das [Sat, 29 Mar 2008 16:19:35 +0000 (16:19 +0000)]
Document modff() and modfl(). Technically, modff() and modfl()
live in libm, while modf() lives in libc due to historical
mistakes. I'm claiming in the manpage that they all live in libm,
since programmers should not rely on the mistake.
jeff [Sat, 29 Mar 2008 07:06:13 +0000 (07:06 +0000)]
- Use vm_object_reference_locked() directly from
vm_object_reference(). This is intended to get rid of vget()
consumers who don't wish to acquire a lock. This is functionally
the same as calling vref(). vm_object_reference_locked() already
uses vref.
jhb [Sat, 29 Mar 2008 03:48:06 +0000 (03:48 +0000)]
Initialize the head pointer in kld_current_sos() to NULL to avoid returning
a junk pointer and possibly causing a seg fault if we don't have any
non-kernel klds (or are unable to walk the list due to core / kernel
mismatch).
mlaier [Sat, 29 Mar 2008 00:24:36 +0000 (00:24 +0000)]
Make ALTQ cope with disappearing interfaces (particularly common with mpd
and netgraph in gernal). This also allows to add queues for an interface
that is not yet existing (you have to provide the bandwidth for the
interface, however).
attilio [Fri, 28 Mar 2008 12:30:12 +0000 (12:30 +0000)]
b_waiters cannot be adequately protected by the interlock because it is
dropped after the call to lockmgr() so just revert this approach using
something similar to the precedent one:
BUF_LOCKWAITERS() just checks if there are waiters (not the actual number
of them) and it is based on newly introduced lockmgr_waiters() which
returns if the lockmgr has waiters or not. The name has been choosen
differently by old lockwaiters() in order to not confuse them.
KPI results enriched by this commit so __FreeBSD_version bumping and
manpage update will be happening soon.
'struct buf' also changes, so kernel ABI is disturbed.
brooks [Fri, 28 Mar 2008 07:57:52 +0000 (07:57 +0000)]
Add support for hardwiring ppp sessions to particular devices with new
per-profile variables of the form ppp_<profile>_unit. No ppp_unit
variable is supported since tying the same unit to more than one profile
won't work.
marcel [Fri, 28 Mar 2008 06:31:12 +0000 (06:31 +0000)]
When retasting, wither any existing GEOMs of the same class. This
allows the class to create a different GEOM for the same provider
as well as avoid that we end up with multiple GEOMs of the same
class with the same name.
For example, when a disk contains a PC98 partition table but
only MBR is supported, then the partition table can be treated
as a MBR. If support for PC98 is later loaded as a module, the
MBR scheme is pre-empted for the PC98 scheme as expected.
yongari [Fri, 28 Mar 2008 01:21:21 +0000 (01:21 +0000)]
In revision 1.70, 1.71 and 1.84 re(4) tried to workaround checksum
offload bugs by manual padding for short IP/UDP frames. Unfortunately
it seems that these workaround does not work reliably on newer PCIe
variants of RealTek chips.
To workaround the hardware bug, always pad short frames if Tx IP
checksum offload is requested. It seems that the hardware has a
bug in IP checksum offload handling. NetBSD manually pads short
frames only when the length of IP frame is less than 28 bytes but I
chose 60 bytes to safety. Also unconditionally set IP checksum
offload bit in Tx descriptor if any TCP or UDP checksum offload is
requested. This is the same way as Linux does but it's not
mentioned in data sheet.
jb [Thu, 27 Mar 2008 23:21:25 +0000 (23:21 +0000)]
The sources covered by Sun's CDDL have been repo copied below the
src/cddl and src/sys/cddl directories per the core@ decision following
the license review.
This change modifies the affected Makefiles to reference the sources
in their new location.
mav [Thu, 27 Mar 2008 23:02:30 +0000 (23:02 +0000)]
Remove ng_setisr() call from ng_dequeue(). It is useless as we any way
will never exit ngintr(), while there is some ready requests on the queue.
It was made years ago with hope of parallel queue processing by several
net threads. But even if we have several threads sometimes, we have no
rights to process queue in parallel as it will break original requests
serialization that is critically important for some setups.
mav [Thu, 27 Mar 2008 20:04:20 +0000 (20:04 +0000)]
Switch from timeval to bintime, to use 1/(2^20) of seconds instead of
microseconds. It allows to use bit shifts instead of some heavy 64bit
mul/div math operations.
iedowse [Thu, 27 Mar 2008 18:02:30 +0000 (18:02 +0000)]
Add IFF_NEEDSGIANT to IFF_CANTCHANGE, to prevent user-level code
from clearing the IFF_NEEDSGIANT flag on Giant-locked interfaces.
In particular, wpa_supplicant was doing this on USB interfaces,
causing panics when Giant-locked code was then called without Giant.
dfr [Thu, 27 Mar 2008 11:54:20 +0000 (11:54 +0000)]
Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
alc [Thu, 27 Mar 2008 04:34:17 +0000 (04:34 +0000)]
MFamd64 with few changes:
1. Add support for automatic promotion of 4KB page mappings to 2MB page
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. Tested by: kris
2. To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.
The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.
jb [Thu, 27 Mar 2008 01:33:26 +0000 (01:33 +0000)]
Allow awk (the one true one!) to handle 64 files instead of just 20.
The current FreeBSD syscall generation script uses all 20 and I need
another open file.
It's a shame that something named as the 'one-true-awk' is so limited
by an old denition like FOPEN_MAX when it could just make the file
handling dynamic.
This is done to avoid touching contrib sources on a vendor branch.
phk [Wed, 26 Mar 2008 22:12:00 +0000 (22:12 +0000)]
Back in the good old days, PC's had random pieces of rock for
frequency generation and what frequency the generated was anyones
guess.
In general the 32.768kHz RTC clock x-tal was the best, because that
was a regular wrist-watch Xtal, whereas the X-tal generating the
ISA bus frequency was much lower quality, often costing as much as
several cents a piece, so it made good sense to check the ISA bus
frequency against the RTC clock.
The other relevant property of those machines, is that they
typically had no more than 16MB RAM.
These days, CPU chips croak if their clocks are not tightly within
specs and all necessary frequencies are derived from the master
crystal by means if PLL's.
Considering that it takes on average 1.5 second to calibrate the
frequency of the i8254 counter, that more likely than not, we will
not actually use the result of the calibration, and as the final
clincher, we seldom use the i8254 for anything besides BEL in
syscons anyway, it has become time to drop the calibration code.
If you need to tell the system what frequency your i8254 runs,
you can do so from the loader using hw.i8254.freq or using the
sysctl kern.timecounter.tc.i8254.frequency.
brooks [Wed, 26 Mar 2008 21:54:48 +0000 (21:54 +0000)]
Allow the characters .-+/ to appear in ppp profile names by folding them
to _ when evaluating ppp_<profile>_nat and ppp_<profile>_mode. Document
the per-profile variables.
rwatson [Wed, 26 Mar 2008 21:29:13 +0000 (21:29 +0000)]
Add a comment explaining that we initialize the 'a' buffer for
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).
jhb [Wed, 26 Mar 2008 20:48:07 +0000 (20:48 +0000)]
Fix a nit with the 'nofoo' options where 'foo' is mapped to 'nonofoo'
(such as 'atime' vs 'noatime'). The filesystems will always see either
'nofoo' or 'nonofoo', never plain 'foo'. As such, their list of valid
mount options should include 'nofoo' instead of 'foo'. With this fix,
you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as
noatime without getting an error. You can also update a noatime FFS
filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime'
using nmount(2) (e.g. 7.x /sbin/mount binary).