davidxu [Wed, 30 Nov 2005 05:12:03 +0000 (05:12 +0000)]
Last step to make mq_notify conform to POSIX standard, If the process
has successfully attached a notification request to the message queue
via a queue descriptor, file closing should remove the attachment.
bde [Wed, 30 Nov 2005 04:56:49 +0000 (04:56 +0000)]
Fixed the hi+lo approximation to log(2). The normal 17+24 bit decomposition
that was used doesn't work normally here, since we want to be able to
multiply `hi' by the exponent of x _exactly_, and the exponent of x has
more than 7 significant bits for most denormal x's, so the multiplication
was not always exact despite a cloned comment claiming that it was. (The
comment is correct in the double precision case -- with the normal 33+53
bit decomposition the exponent can have 20 significant bits and the extra
bit for denormals is only the 11th.)
Fixing this had little or no effect for denormals (I think because
more precision is inherently lost for denormals than is lost by roundoff
errors in the multiplication).
The fix is to reduce the precision of the decomposition to 16+24 bits.
Due to 2 bugs in the old deomposition and numerical accidents, reducing
the precision actually increased the precision of hi+lo. The old hi+lo
had about 39 bits instead of at least 41 like it should have had.
There were off-by-1-bit errors in each of hi and lo, apparently due
to mistranslation from the double precision hi and lo. The correct
16 bit hi happens to give about 19 bits of precision, so the correct
hi+lo gives about 43 bits instead of at least 40. The end result is
that expf() is now perfectly rounded (to nearest) except in 52561 cases
instead of except in 67027 cases, and the maximum error is 0.5013 ulps
instead of 0.5023 ulps.
jhb [Tue, 29 Nov 2005 23:07:14 +0000 (23:07 +0000)]
Fix snderr() to not leak the socket buffer lock if an error occurs in
sosend(). Robert accidentally changed the snderr() macro to jump to the
out label which assumes the lock is already released rather than the
release label which drops the lock in his previous change to sosend().
This should fix the recent panics about returning from write(2) with the
socket lock held and the most recent LOR on current@.
peter [Tue, 29 Nov 2005 22:54:49 +0000 (22:54 +0000)]
The DEFAULTS changes caused the user specified config file to be opened
much later than before, and it is now after we do a mkdir ../compile/FILE.
As a result, if you do 'config DOESNOTEXIST', it now creates the directory
../config/DOESNOTEXIST. It did not do that before. If DEFAULTS does not
exist, it still fails early before any permanent changes.
This shameless hack restores the old behavior of ensuring the config file
actually exists before mkdiring its counterpart directory.
Now I can rmdir ../compile/D and it will stay dead, after my fingers keep
sabotaging me with 'config D<tab><enter>'. (Some of my kernel names
started with D, which used to be 1-character unique and my fingers knew
this very well...)
jhb [Tue, 29 Nov 2005 17:11:09 +0000 (17:11 +0000)]
- We don't install USD docs for games anymore since the games with docs
(trek) aren't in the base system anymore.
- dm(8) isn't in the base system anymore either, so don't xref it either.
jhb [Tue, 29 Nov 2005 16:51:49 +0000 (16:51 +0000)]
- Axe the PARTITIONING and IOCTLS section as this has been made obsolete
now that all that stuff has been abstracted out of the disk drivers with
GEOM.
- Reference bsdlabel(8) rather than disklabel(8).
Ok'd by: phk, scottl (1)
Submitted by: Björn König (2)
ru [Tue, 29 Nov 2005 09:37:42 +0000 (09:37 +0000)]
Drop the -I/usr/include (or any of its variants) from CFLAGS.
The sys/sys/stddef.h is here for some time now to fulfil the
kernel needs. It also was not reliable due to the exists(@)
check: in an empty module directory, "make depend; mv .depend
.depend~; make depend" ran mkdep(1) with different arguments.
glebius [Tue, 29 Nov 2005 00:11:01 +0000 (00:11 +0000)]
First step in removing welding between ipfw(4) and dummynet.
o Do not use ipfw_insn_pipe->pipe_ptr in locate_flowset(). The
_ipfw_insn_pipe isn't touched by this commit to preserve ABI
compatibility.
o To optimize the lookup of the pipe/flowset in locate_flowset()
introduce hashes for pipes and queues:
- To preserve ABI compatibility utilize the place of global list
pointer for SLIST_ENTRY.
- Introduce locate_flowset(queue nr) and locate_pipe(pipe nr).
o Rework all the dummynet code to deal with the hashes, not global
lists. Also did some style(9) changes in the code blocks that were
touched by this sweep:
- Be conservative about flowset and pipe variable names on stack,
use "fs" and "pipe" everywhere.
- Cleanup whitespaces.
- Sort variables.
- Give variables more meaningful names.
- Uppercase and dots in comments.
- ENOMEM when malloc(9) failed.
anholt [Mon, 28 Nov 2005 23:13:57 +0000 (23:13 +0000)]
Update DRM to CVS snapshot as of 2005-11-28. Notable changes:
- S3 Savage driver ported.
- Added support for ATI_fragment_shader registers for r200.
- Improved r300 support, needed for latest r300 DRI driver.
- (possibly) r300 PCIE support, needs X.Org server from CVS.
- Added support for PCI Matrox cards.
- Software fallbacks fixed for Rage 128, which used to render badly or hang.
- Some issues reported by WITNESS are fixed.
- i915 module Makefile added, as the driver may now be working, but is untested.
- Added scripts for copying and preprocessing DRM CVS for inclusion in the
kernel. Thanks to Daniel Stone for getting me started on that.
jhb [Mon, 28 Nov 2005 20:18:43 +0000 (20:18 +0000)]
If we get a stray interrupt, return after logging it. In the extremely
rare case of a stray interrupt to an unregistered source (such as a stray
interrupt from the 8259As when using APIC), this could result in a page
fault when it tried to walk the list of interrupt handlers to execute
INTR_FAST handlers. This bug was introduced with the intr_event changes,
so it's not present in 5.x or 6.x.
Submitted by: Mark Tinguely tinguely at casselton dot net
jhb [Mon, 28 Nov 2005 19:09:08 +0000 (19:09 +0000)]
When checking to see if a process has exceeded its time limit, flag the
process as over the limit when its time is >= to the limit rather than >
the limit. Technically, if p->p_rux.rux_runtime.sec == p->p_pcpulimit
and p->p_rux.rux_runtime.frac == 0, the process hasn't exceeded the limit
yet. However, having the fraction exactly equal to 0 is rather rare, and
it is not worth the overhead to handle that edge case. With just the >
comparison, the process would have to exceed its limit by almost a second
before it was killed.
PR: kern/83192
Submitted by: Maciej Zawadzinski mzawadzinski at gmail dot com
Reviewed by: bde
MFC after: 1 week
rwatson [Mon, 28 Nov 2005 18:09:03 +0000 (18:09 +0000)]
Break out functionality in sosend() responsible for building mbuf
chains and copying in mbufs from the body of the send logic, creating
a new function sosend_copyin(). This changes makes sosend() almost
readable, and will allow the same logic to be used by tailored socket
send routines.
imp [Mon, 28 Nov 2005 17:51:31 +0000 (17:51 +0000)]
Version 600004 is better than 700000 given other changes that are in
the pipeline. We had to bump the version for 600004 because the old
parser got confused and generated bogus output.
imp [Mon, 28 Nov 2005 17:47:54 +0000 (17:47 +0000)]
600004 is a better new version than 700000 based on some future commits to
this file. With ru@'s approval, change it to this version. In this case we
had to bump the version because the old parser would choke on | in the new
'or' syntax and consider that a device.
jhb [Mon, 28 Nov 2005 16:30:16 +0000 (16:30 +0000)]
Restore the previous state after a FILL operation in properties_read()
rather than forcing the state to LOOK. If we are in the middle of parsing
a line when we have to do a FILL we would have lost any token we were in
the middle of parsing and would have treated the next character as being
at the start of a new line instead.
bde [Mon, 28 Nov 2005 11:46:20 +0000 (11:46 +0000)]
Rearranged the polynomial evaluation some more to reduce dependencies.
Instead of echoing the code in a comment, try to describe why we split
up the evaluation in a special way.
The new optimization is mostly to move the evaluation of w = z*z later
so that everything else (except z = x*x) doesn't have to wait for w.
On Athlons, FP multiplication has a latency of 4 cycles so this
optimization saves 4 cycles per call provided no new dependencies are
introduced. Tweaking the other terms in to reduce dependencies saves
a couple more cycles in some cases (more on AXP than on A64; up to 8
cycles out of 56 altogether in some cases). The previous version had
a similar optimization for s = z*x. Special optimizations like these
probably have a larger effect than the simple 2-way vectorization
permitted (but not activated by gcc) in the old version, since 2-way
vectorization is not enough and the polynomial's degree is so small
in the float case that non-vectorizable dependencies dominate.
On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 34-55 cycles (was 39-59 cycles).
bde [Mon, 28 Nov 2005 08:32:15 +0000 (08:32 +0000)]
Fixed about 50 million errors of infinity ulps and about 3 million errors
of between 1.0 and 1.8509 ulps for lgammaf(x) with x between -2**-21 and
-2**-70.
As usual, the cutoff for tiny args was not correctly translated to
float precision. It was 2**-70 but 2**-21 works. Not as usual, having
a too-small threshold was worse than a pessimization. It was just a
pessimization for (positive) args between 2**-70 and 2**-21, but for
the first ~50 million (negative) args below -2**-70, the general code
overflowed and gave a result of infinity instead of correct (finite)
results near 70*log(2). For the remaining ~361 million negative args
above -2**21, the general code gave almost-acceptable errors (lgamma[f]()
is not very accurate in general) but the pessimization was larger than
for misclassified tiny positive args.
Now the max error for lgammaf(x) with |x| < 2**-21 is 0.7885 ulps, and
speed and accuracy are almost the same for positive and negative args
in this range. The maximum error overall is still infinity ulps.
A cutoff of 2**-70 is probably wastefully small for the double precision
case. Smaller cutoffs can be used to reduce the max error to nearly
0.5 ulps for tiny args, but this is useless since the general algrorithm
for nearly-tiny args is not nearly that accurate -- it has a max error of
about 1 ulp.
bde [Mon, 28 Nov 2005 06:15:10 +0000 (06:15 +0000)]
Exploit skew-symmetry to avoid an operation: -sin(x-A) = sin(A-x). This
gives a tiny but hopefully always free optimization in the 2 quadrants
to which it applies. On Athlons, it reduces maximum latency by 4 cycles
in these quadrants but has usually has a smaller effect on total time
(typically ~2 cycles (~5%), but sometimes 8 cycles when the compiler
generates poor code).
bde [Mon, 28 Nov 2005 05:46:13 +0000 (05:46 +0000)]
Try to use the "proximity" (~) operator consistently in comments
(x ~<= a, not x <= ~a). This got messed up in some places when the
comments were moved from e_rem_pio2f.c.
bde [Mon, 28 Nov 2005 04:58:57 +0000 (04:58 +0000)]
Use only double precision for "kernel" cosf and sinf (except for
returning float). The functions are renamed from __kernel_{cos,sin}f()
to __kernel_{cos,sin}df() so that misuses of them will cause link errors
and not crashes.
This version is an almost-routine translation with no special optimizations
for accuracy or efficiency. The not-quite-routine part is that in
__kernel_cosf(), regenerating the minimax polynomial with double
precision coefficients gives a coefficient for the x**2 term that is
not quite -0.5, so the literal 0.5 in the code and the related `hz'
variable need to be modified; also, the special code for reducing the
error in 1.0-x**2*0.5 is no longer needed, so it is convenient to
adjust all the logic for the x**2 term a little. Note that without
extra precision, it would be very bad to use a coefficient of other
than -0.5 for the x**2 term -- the old version depends on multiplication
by -0.5 being infinitely precise so as not to need even more special
code for reducing the error in 1-x**2*0.5.
This gives an unimportant increase in accuracy, from ~0.8 to ~0.501
ulps. Almost all of the error is from the final rounding step, since
the choice of the minimax polynomials so that their contribution to the
error is a bit less than 0.5 ulps just happens to give contributions that
are significantly less (~.001 ulps).
An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
overall speed increases in the 10-20% range, despite giving a speed
decrease of typically 19% (from 31 cycles up to 37) for sinf() on args
in [-pi/4, pi/4].
rodrigc [Mon, 28 Nov 2005 03:21:58 +0000 (03:21 +0000)]
Remove commented out reference to posix4/mqueue.h. It hasn't been installed
for 3 years, and now we have another (working) implementation
of POSIX message queues elsewhere in the source tree.
iedowse [Sun, 27 Nov 2005 09:05:37 +0000 (09:05 +0000)]
The ohci driver's processing of completed transfer descriptors (TDs)
appeared to rely on all kinds of non-guaranteed behaviours: the
transfer abort code assumed that TDs with no interrupt timeout
configured would end up on the done queue within 20ms, the done
queue processing assumed that all TDs from a transfer would appear
at the same time, and there were access-after-free bugs triggered
on failed transfers.
Attempt to fix these problems by the following changes:
- Use a maximum (6-frame) interrupt delay instead of no interrupt
delay to ensure that the 20ms wait in ohci_abort_xfer() is enough
for the TDs to have been taken off the hardware done queue.
- Defer cancellation of timeouts and freeing of TDs until we either
hit an error or reach the final TD.
- Remove TDs from the done queue before freeing them so that it
is safe to continue traversing the done queue.
This appears to fix a hang that was reproducable with revision 1.67
or 1.68 of ulpt.c (earlier revisions had a different transfer
pattern). With certain HP printers, the command "true > /dev/ulpt0"
would cause ohci_add_done() to spin because the done queue had a
loop. The list corruption was caused by a 3-TD transfer where the
first TD completed but remained on the internal host controller
done queue because it had no interrupt timeout. When the transfer
timed out, the TD got freed and reused, so it caused a loop in the
done queue when it was inserted a second time from a different
transfer.
keramida [Sun, 27 Nov 2005 07:30:21 +0000 (07:30 +0000)]
Clarify a comment to make it clear that it is NO_NIS that "If it is set"
refers to and add extra '#' comment characters at the beginning of two
lines that started with TABs, to avoid warnings like:
"/etc/make.conf", line 128: Unassociated shell command "# If set, you might need to adopt your"
"/etc/make.conf", line 129: Unassociated shell command "# nsswitch.conf(5) and remove `nis' entries."
kientzle [Sun, 27 Nov 2005 03:16:46 +0000 (03:16 +0000)]
Portability: Remove AC_CHECK_MALLOC from configure.ac.in.
libarchive doesn't make malloc(0) requests, so the autoconf
checks aren't needed and the autoconf workarounds for
broken malloc(0) just create problems.
Thanks to: Dan Nelson, who reports that this fixes libarchive on AIX 5.2
Add experimental low-precision clockid_t names corresponding to these
clocks, but implemented using cached timestamps in kernel rather than
a full time counter query. This offers a minimum update rate of 1/HZ,
but in practice will often be more frequent due to the frequency of
time stamping in the kernel:
New clockid_t name Approximates existing clockid_t
Add one additional new clockid_t, CLOCK_SECOND, which returns the
current second without performing a full time counter query or cache
lookup overhead to make sure the cached timestamp is stable. This is
intended to support very low granularity consumers, such as time(3).
The names, visibility, and implementation of the above are subject
to change, and will not be MFC'd any time soon. The goal is to
expose lower quality time measurement to applications willing to
sacrifice accuracy in performance critical paths, such as when taking
time stamps for the purpose of rescheduling select() and poll()
timeouts. Future changes might include retrofitting the time counter
infrastructure to allow the "fast" time query mechanisms to use a
different time counter, rather than a cached time counter (i.e.,
TSC).
NOTE: With different underlying time mechanisms exposed, using
different time query mechanisms in the same application may result in
relative non-monoticity or the appearance of clock stalling for a
single clockid_t, as a cached time stamp queried after a precision
time stamp lookup may be "before" the time returned by the earlier
live time counter query.
njl [Sat, 26 Nov 2005 07:36:53 +0000 (07:36 +0000)]
Add a locking stub to call acpi_cmbat_get_bif() now that it is directly
run from the taskqueue. There should probably be a better way to do this
later, but this suffices for now.
scottl [Sat, 26 Nov 2005 07:30:09 +0000 (07:30 +0000)]
The CAM interface is broken and seems to be causing lockups on boot. It
doesn't appear to have worked in a long time, so just disable it completely
for now.