Tai-hwa Liang [Thu, 1 Dec 2005 00:18:48 +0000 (00:18 +0000)]
Fixing yet another regression introduced in rev1.37 by preserving cs_local
pointer such that local to DOS code page conversion with combined option
'-L,-D' works again.
Maxim Sobolev [Wed, 30 Nov 2005 22:54:41 +0000 (22:54 +0000)]
It is unclear who is wrong and who is right, but when operating on
plain file bsdlabel(8) always writes label at a fixed offset from
its beginning (512 bytes), regardless of the sector size. At the same
time, bsdlabel geom class expects label to be available at the very
beginning of the second sector.
As a result, images prepared in userland for media with sector size
different from 512 bytes (i.e. 2k for cdroms) are not recognized by
the tasting mechanism.
Solve the problem by always looking for the label at 512-byte offset
if we can't find it at the beginning of the second sector and sector
size is not 512 bytes.
Ruslan Ermilov [Wed, 30 Nov 2005 18:15:06 +0000 (18:15 +0000)]
Teach this to create the "machine" and ${MACHINE_ARCH} (for pc98
only now) symbolic links in the kernel compile directory, rather
than relying on config(8) to do this. (The changes to config(8)
will be committed separately.) This is aimed towards making the
config(8) as lightweight as possible.
Bruce Evans [Wed, 30 Nov 2005 11:51:17 +0000 (11:51 +0000)]
Rearranged the polynomial evaluation to reduce dependencies, as in
k_tanf.c but with different details.
The polynomial is odd with degree 13 for tanf() and odd with degree
9 for sinf(), so the details are not very different for sinf() -- the
term with the x**11 and x**13 coefficients goes awaym and (mysteriously)
it helps to do the evaluation of w = z*z early although moving it later
was a key optimization for tanf(). The details are different but simpler
for cosf() because the polynomial is even and of lower degree.
On Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
an optimization of about 4 cycles (10%) in most cases (13% for sinf()
on AXP, but 0% for cosf() with gcc-3.3 -O1 on AXP). The best case
(sinf() with gcc-3.4 -O1 -fcaller-saves on A64) now takes 33-39 cycles
(was 37-45 cycles). Hardware sinf takes 74-129 cycles. Despite
being fine tuned for Athlons, the optimization is even larger on
some other arches (about 15% on ia64 (pluto2) and 20% on alpha (beast)
with gcc -O2 -fomit-frame-pointer).
Bruce Evans [Wed, 30 Nov 2005 06:47:18 +0000 (06:47 +0000)]
Fixed cosf(x) when x is a "negative" NaNs. I broke this in rev.1.10.
cosf(x) is supposed to return something like x when x is a NaN, and
we actually fairly consistently return x-x which is normally very like
x (on i386 and and it is x if x is a quiet NaN and x with the quiet bit
set if x is a signaling NaN. Rev.1.10 broke this by normalising x to
fabsf(x). It's not clear if fabsf(x) is should preserve x if x is a NaN,
but it actually clears the sign bit, and other parts of the code depended
on this.
The bugs can be fixed by saving x before normalizing it, and using the
saved x only for NaNs, and using uint32_t instead of int32_t for ix
so that negative NaNs are not misclassified even if fabsf() doesn't
clear their sign bit, but gcc pessimizes the saving very well, especially
on Athlon XPs (it generates extra loads and stores, and mixes use of
the SSE and i387, and this somehow messes up pipelines). Normalizing
x is not a very good optimization anyway, so stop doing it. (It adds
latency to the FPU pipelines, but in previous versions it helped except
for |x| <= 3pi/4 by simplifying the integer pipelines.) Use the same
organization as in s_sinf.c and s_tanf.c with some branches reordered.
These changes combined recover most of the performance of the unfixed
version on A64 but still lose 10% on AXP with gcc-3.4 -O1 but not with
gcc-3.3 -O1.
David Xu [Wed, 30 Nov 2005 05:12:03 +0000 (05:12 +0000)]
Last step to make mq_notify conform to POSIX standard, If the process
has successfully attached a notification request to the message queue
via a queue descriptor, file closing should remove the attachment.
Bruce Evans [Wed, 30 Nov 2005 04:56:49 +0000 (04:56 +0000)]
Fixed the hi+lo approximation to log(2). The normal 17+24 bit decomposition
that was used doesn't work normally here, since we want to be able to
multiply `hi' by the exponent of x _exactly_, and the exponent of x has
more than 7 significant bits for most denormal x's, so the multiplication
was not always exact despite a cloned comment claiming that it was. (The
comment is correct in the double precision case -- with the normal 33+53
bit decomposition the exponent can have 20 significant bits and the extra
bit for denormals is only the 11th.)
Fixing this had little or no effect for denormals (I think because
more precision is inherently lost for denormals than is lost by roundoff
errors in the multiplication).
The fix is to reduce the precision of the decomposition to 16+24 bits.
Due to 2 bugs in the old deomposition and numerical accidents, reducing
the precision actually increased the precision of hi+lo. The old hi+lo
had about 39 bits instead of at least 41 like it should have had.
There were off-by-1-bit errors in each of hi and lo, apparently due
to mistranslation from the double precision hi and lo. The correct
16 bit hi happens to give about 19 bits of precision, so the correct
hi+lo gives about 43 bits instead of at least 40. The end result is
that expf() is now perfectly rounded (to nearest) except in 52561 cases
instead of except in 67027 cases, and the maximum error is 0.5013 ulps
instead of 0.5023 ulps.
John Baldwin [Tue, 29 Nov 2005 23:07:14 +0000 (23:07 +0000)]
Fix snderr() to not leak the socket buffer lock if an error occurs in
sosend(). Robert accidentally changed the snderr() macro to jump to the
out label which assumes the lock is already released rather than the
release label which drops the lock in his previous change to sosend().
This should fix the recent panics about returning from write(2) with the
socket lock held and the most recent LOR on current@.
Peter Wemm [Tue, 29 Nov 2005 22:54:49 +0000 (22:54 +0000)]
The DEFAULTS changes caused the user specified config file to be opened
much later than before, and it is now after we do a mkdir ../compile/FILE.
As a result, if you do 'config DOESNOTEXIST', it now creates the directory
../config/DOESNOTEXIST. It did not do that before. If DEFAULTS does not
exist, it still fails early before any permanent changes.
This shameless hack restores the old behavior of ensuring the config file
actually exists before mkdiring its counterpart directory.
Now I can rmdir ../compile/D and it will stay dead, after my fingers keep
sabotaging me with 'config D<tab><enter>'. (Some of my kernel names
started with D, which used to be 1-character unique and my fingers knew
this very well...)
John Baldwin [Tue, 29 Nov 2005 17:11:09 +0000 (17:11 +0000)]
- We don't install USD docs for games anymore since the games with docs
(trek) aren't in the base system anymore.
- dm(8) isn't in the base system anymore either, so don't xref it either.
John Baldwin [Tue, 29 Nov 2005 16:51:49 +0000 (16:51 +0000)]
- Axe the PARTITIONING and IOCTLS section as this has been made obsolete
now that all that stuff has been abstracted out of the disk drivers with
GEOM.
- Reference bsdlabel(8) rather than disklabel(8).
Ok'd by: phk, scottl (1)
Submitted by: Björn König (2)
Ruslan Ermilov [Tue, 29 Nov 2005 09:37:42 +0000 (09:37 +0000)]
Drop the -I/usr/include (or any of its variants) from CFLAGS.
The sys/sys/stddef.h is here for some time now to fulfil the
kernel needs. It also was not reliable due to the exists(@)
check: in an empty module directory, "make depend; mv .depend
.depend~; make depend" ran mkdep(1) with different arguments.
Gleb Smirnoff [Tue, 29 Nov 2005 00:11:01 +0000 (00:11 +0000)]
First step in removing welding between ipfw(4) and dummynet.
o Do not use ipfw_insn_pipe->pipe_ptr in locate_flowset(). The
_ipfw_insn_pipe isn't touched by this commit to preserve ABI
compatibility.
o To optimize the lookup of the pipe/flowset in locate_flowset()
introduce hashes for pipes and queues:
- To preserve ABI compatibility utilize the place of global list
pointer for SLIST_ENTRY.
- Introduce locate_flowset(queue nr) and locate_pipe(pipe nr).
o Rework all the dummynet code to deal with the hashes, not global
lists. Also did some style(9) changes in the code blocks that were
touched by this sweep:
- Be conservative about flowset and pipe variable names on stack,
use "fs" and "pipe" everywhere.
- Cleanup whitespaces.
- Sort variables.
- Give variables more meaningful names.
- Uppercase and dots in comments.
- ENOMEM when malloc(9) failed.
Eric Anholt [Mon, 28 Nov 2005 23:13:57 +0000 (23:13 +0000)]
Update DRM to CVS snapshot as of 2005-11-28. Notable changes:
- S3 Savage driver ported.
- Added support for ATI_fragment_shader registers for r200.
- Improved r300 support, needed for latest r300 DRI driver.
- (possibly) r300 PCIE support, needs X.Org server from CVS.
- Added support for PCI Matrox cards.
- Software fallbacks fixed for Rage 128, which used to render badly or hang.
- Some issues reported by WITNESS are fixed.
- i915 module Makefile added, as the driver may now be working, but is untested.
- Added scripts for copying and preprocessing DRM CVS for inclusion in the
kernel. Thanks to Daniel Stone for getting me started on that.
John Baldwin [Mon, 28 Nov 2005 20:18:43 +0000 (20:18 +0000)]
If we get a stray interrupt, return after logging it. In the extremely
rare case of a stray interrupt to an unregistered source (such as a stray
interrupt from the 8259As when using APIC), this could result in a page
fault when it tried to walk the list of interrupt handlers to execute
INTR_FAST handlers. This bug was introduced with the intr_event changes,
so it's not present in 5.x or 6.x.
Submitted by: Mark Tinguely tinguely at casselton dot net
John Baldwin [Mon, 28 Nov 2005 19:09:08 +0000 (19:09 +0000)]
When checking to see if a process has exceeded its time limit, flag the
process as over the limit when its time is >= to the limit rather than >
the limit. Technically, if p->p_rux.rux_runtime.sec == p->p_pcpulimit
and p->p_rux.rux_runtime.frac == 0, the process hasn't exceeded the limit
yet. However, having the fraction exactly equal to 0 is rather rare, and
it is not worth the overhead to handle that edge case. With just the >
comparison, the process would have to exceed its limit by almost a second
before it was killed.
PR: kern/83192
Submitted by: Maciej Zawadzinski mzawadzinski at gmail dot com
Reviewed by: bde
MFC after: 1 week
Robert Watson [Mon, 28 Nov 2005 18:09:03 +0000 (18:09 +0000)]
Break out functionality in sosend() responsible for building mbuf
chains and copying in mbufs from the body of the send logic, creating
a new function sosend_copyin(). This changes makes sosend() almost
readable, and will allow the same logic to be used by tailored socket
send routines.
Warner Losh [Mon, 28 Nov 2005 17:51:31 +0000 (17:51 +0000)]
Version 600004 is better than 700000 given other changes that are in
the pipeline. We had to bump the version for 600004 because the old
parser got confused and generated bogus output.
Warner Losh [Mon, 28 Nov 2005 17:47:54 +0000 (17:47 +0000)]
600004 is a better new version than 700000 based on some future commits to
this file. With ru@'s approval, change it to this version. In this case we
had to bump the version because the old parser would choke on | in the new
'or' syntax and consider that a device.
John Baldwin [Mon, 28 Nov 2005 16:30:16 +0000 (16:30 +0000)]
Restore the previous state after a FILL operation in properties_read()
rather than forcing the state to LOOK. If we are in the middle of parsing
a line when we have to do a FILL we would have lost any token we were in
the middle of parsing and would have treated the next character as being
at the start of a new line instead.
Bruce Evans [Mon, 28 Nov 2005 11:46:20 +0000 (11:46 +0000)]
Rearranged the polynomial evaluation some more to reduce dependencies.
Instead of echoing the code in a comment, try to describe why we split
up the evaluation in a special way.
The new optimization is mostly to move the evaluation of w = z*z later
so that everything else (except z = x*x) doesn't have to wait for w.
On Athlons, FP multiplication has a latency of 4 cycles so this
optimization saves 4 cycles per call provided no new dependencies are
introduced. Tweaking the other terms in to reduce dependencies saves
a couple more cycles in some cases (more on AXP than on A64; up to 8
cycles out of 56 altogether in some cases). The previous version had
a similar optimization for s = z*x. Special optimizations like these
probably have a larger effect than the simple 2-way vectorization
permitted (but not activated by gcc) in the old version, since 2-way
vectorization is not enough and the polynomial's degree is so small
in the float case that non-vectorizable dependencies dominate.
On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 34-55 cycles (was 39-59 cycles).
Bruce Evans [Mon, 28 Nov 2005 08:32:15 +0000 (08:32 +0000)]
Fixed about 50 million errors of infinity ulps and about 3 million errors
of between 1.0 and 1.8509 ulps for lgammaf(x) with x between -2**-21 and
-2**-70.
As usual, the cutoff for tiny args was not correctly translated to
float precision. It was 2**-70 but 2**-21 works. Not as usual, having
a too-small threshold was worse than a pessimization. It was just a
pessimization for (positive) args between 2**-70 and 2**-21, but for
the first ~50 million (negative) args below -2**-70, the general code
overflowed and gave a result of infinity instead of correct (finite)
results near 70*log(2). For the remaining ~361 million negative args
above -2**21, the general code gave almost-acceptable errors (lgamma[f]()
is not very accurate in general) but the pessimization was larger than
for misclassified tiny positive args.
Now the max error for lgammaf(x) with |x| < 2**-21 is 0.7885 ulps, and
speed and accuracy are almost the same for positive and negative args
in this range. The maximum error overall is still infinity ulps.
A cutoff of 2**-70 is probably wastefully small for the double precision
case. Smaller cutoffs can be used to reduce the max error to nearly
0.5 ulps for tiny args, but this is useless since the general algrorithm
for nearly-tiny args is not nearly that accurate -- it has a max error of
about 1 ulp.
Bruce Evans [Mon, 28 Nov 2005 06:15:10 +0000 (06:15 +0000)]
Exploit skew-symmetry to avoid an operation: -sin(x-A) = sin(A-x). This
gives a tiny but hopefully always free optimization in the 2 quadrants
to which it applies. On Athlons, it reduces maximum latency by 4 cycles
in these quadrants but has usually has a smaller effect on total time
(typically ~2 cycles (~5%), but sometimes 8 cycles when the compiler
generates poor code).
Bruce Evans [Mon, 28 Nov 2005 05:46:13 +0000 (05:46 +0000)]
Try to use the "proximity" (~) operator consistently in comments
(x ~<= a, not x <= ~a). This got messed up in some places when the
comments were moved from e_rem_pio2f.c.
Bruce Evans [Mon, 28 Nov 2005 04:58:57 +0000 (04:58 +0000)]
Use only double precision for "kernel" cosf and sinf (except for
returning float). The functions are renamed from __kernel_{cos,sin}f()
to __kernel_{cos,sin}df() so that misuses of them will cause link errors
and not crashes.
This version is an almost-routine translation with no special optimizations
for accuracy or efficiency. The not-quite-routine part is that in
__kernel_cosf(), regenerating the minimax polynomial with double
precision coefficients gives a coefficient for the x**2 term that is
not quite -0.5, so the literal 0.5 in the code and the related `hz'
variable need to be modified; also, the special code for reducing the
error in 1.0-x**2*0.5 is no longer needed, so it is convenient to
adjust all the logic for the x**2 term a little. Note that without
extra precision, it would be very bad to use a coefficient of other
than -0.5 for the x**2 term -- the old version depends on multiplication
by -0.5 being infinitely precise so as not to need even more special
code for reducing the error in 1-x**2*0.5.
This gives an unimportant increase in accuracy, from ~0.8 to ~0.501
ulps. Almost all of the error is from the final rounding step, since
the choice of the minimax polynomials so that their contribution to the
error is a bit less than 0.5 ulps just happens to give contributions that
are significantly less (~.001 ulps).
An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
overall speed increases in the 10-20% range, despite giving a speed
decrease of typically 19% (from 31 cycles up to 37) for sinf() on args
in [-pi/4, pi/4].
Craig Rodrigues [Mon, 28 Nov 2005 03:21:58 +0000 (03:21 +0000)]
Remove commented out reference to posix4/mqueue.h. It hasn't been installed
for 3 years, and now we have another (working) implementation
of POSIX message queues elsewhere in the source tree.
Ian Dowse [Sun, 27 Nov 2005 09:05:37 +0000 (09:05 +0000)]
The ohci driver's processing of completed transfer descriptors (TDs)
appeared to rely on all kinds of non-guaranteed behaviours: the
transfer abort code assumed that TDs with no interrupt timeout
configured would end up on the done queue within 20ms, the done
queue processing assumed that all TDs from a transfer would appear
at the same time, and there were access-after-free bugs triggered
on failed transfers.
Attempt to fix these problems by the following changes:
- Use a maximum (6-frame) interrupt delay instead of no interrupt
delay to ensure that the 20ms wait in ohci_abort_xfer() is enough
for the TDs to have been taken off the hardware done queue.
- Defer cancellation of timeouts and freeing of TDs until we either
hit an error or reach the final TD.
- Remove TDs from the done queue before freeing them so that it
is safe to continue traversing the done queue.
This appears to fix a hang that was reproducable with revision 1.67
or 1.68 of ulpt.c (earlier revisions had a different transfer
pattern). With certain HP printers, the command "true > /dev/ulpt0"
would cause ohci_add_done() to spin because the done queue had a
loop. The list corruption was caused by a 3-TD transfer where the
first TD completed but remained on the internal host controller
done queue because it had no interrupt timeout. When the transfer
timed out, the TD got freed and reused, so it caused a loop in the
done queue when it was inserted a second time from a different
transfer.
Clarify a comment to make it clear that it is NO_NIS that "If it is set"
refers to and add extra '#' comment characters at the beginning of two
lines that started with TABs, to avoid warnings like:
"/etc/make.conf", line 128: Unassociated shell command "# If set, you might need to adopt your"
"/etc/make.conf", line 129: Unassociated shell command "# nsswitch.conf(5) and remove `nis' entries."
Tim Kientzle [Sun, 27 Nov 2005 03:16:46 +0000 (03:16 +0000)]
Portability: Remove AC_CHECK_MALLOC from configure.ac.in.
libarchive doesn't make malloc(0) requests, so the autoconf
checks aren't needed and the autoconf workarounds for
broken malloc(0) just create problems.
Thanks to: Dan Nelson, who reports that this fixes libarchive on AIX 5.2