ceri [Tue, 20 Dec 2005 11:04:01 +0000 (11:04 +0000)]
Liberation Day is no longer celebrated in Romania; rather a national
holiday is now celebrated on December 1st. From the PR:
December 1 was adopted as National Day in 1990, being the day of
celebration of the Great Assembly of Alba Iulia which voted for the
union of Transylvania with Romania and which symbolise the union of all
Romanians within a single state and the achievement of the unity of
Romanian national state. [1]
[1] LAW Number 10 from July 31st, 1990
Regarding the proclamation of the National Day of Romania
http://www.1decembrie.ro/en/index.php?option=com_content&task=view&id=1&Itemid=4
PR: docs/90673
Submitted by: Ion-Mihai "IOnut" Tetcu
Originally pointed out by: Cornel Ilie <cornel dot c punkt ilie at gmail punkt com>
bde [Tue, 20 Dec 2005 01:21:30 +0000 (01:21 +0000)]
Extract the high and low words together. With gcc-3.4 on uniformly
distributed non-large args, this saves about 14 of 134 cycles for
Athlon64s and about 5 of 199 cycles for AthlonXPs.
Moved the check for x == 0 inside the check for subnormals. With
gcc-3.4 on uniformly distributed non-large args, this saves another
5 cycles on Athlon64s and loses 1 cycle on AthlonXPs.
Use INSERT_WORDS() and not SET_HIGH_WORD() when converting the first
approximation from bits to double. With gcc-3.4 on uniformly distributed
non-large args, this saves another 4 cycles on both Athlon64s and and
AthlonXPs.
Accessing doubles as 2 words may be an optimization on old CPUs, but on
current CPUs it tends to cause extra operations and pipeline stalls,
especially for writes, even when only 1 of the words needs to be accessed.
pjd [Tue, 20 Dec 2005 00:49:59 +0000 (00:49 +0000)]
Reduce Giant scope a bit, as fdrop() is believed to be MPSAFE.
The purpose of this change is consistency (not performance improvement:)),
as it was hard to tell if fdrop() is MPSAFE or not when I saw it sometimes
under the Giant and sometimes without it.
pjd [Tue, 20 Dec 2005 00:43:51 +0000 (00:43 +0000)]
vfs_mount_alloc() always returns 0, but what we really want is newly
allocated 'struct mount *' pointer, so simplify code a bit and return
the pointer directly.
mlaier [Tue, 20 Dec 2005 00:33:33 +0000 (00:33 +0000)]
Move PFSTATE_EXPIRING from sync_flags to a new local_flags. sync_flags has
special handling when zero. This caused no PFSYNC_ACT_DEL message and thus
disfunction of pfflowd and state synchronisation in general.
Discovered by: thompsa
Good catch by: thompsa
MFC after: 7 days
marcel [Mon, 19 Dec 2005 20:20:36 +0000 (20:20 +0000)]
o Add the GNU symbol versioning section constants (SHT_GNU_verdef,
SHT_GNU_verneed, SHT_GNU_versym),
o Fix the definition of DT_HIOS -- it was short an 'f'...
sobomax [Mon, 19 Dec 2005 18:39:01 +0000 (18:39 +0000)]
(forced)
We loaded PAE kernels at 2MB and non-PAE kernels at 4MB to skip
the first PSE page (PSE pages are 2MB on PAE), not at 8MB as it
was incorrectly specified in the previous commit.
dougb [Mon, 19 Dec 2005 10:57:00 +0000 (10:57 +0000)]
Clear up problems with /etc/rc.d/{abi|cleanvar|cleartmp} brought
to light by the PR. Specifically, convert these three scripts
into good rc.d citizens, making sure that their functionality
is preserved, but the rc.d framework rules are not broken.
Add support for cleanvar as a regular rc.d script in the
default rc.conf, and document this in the man page.
Add a descriptive comment to rc.conf that regarding the
three emulation/compatibility services provided by abi
so users will not be confused by these services not having
their own startup scripts.
PR: conf/84574
Submitted by: Alexander Botero-Lowry
sobomax [Mon, 19 Dec 2005 09:26:42 +0000 (09:26 +0000)]
If LOADER_BZIP2_SUPPORT is defined allocate heap in the 1MB-4MB range to
provide enough room for decompression (up to 2.5MB is necessary). This
should be safe to do since we load i386 kernels after 8MB mark now, so
that 16MB is the minimum amount of RAM necessary to even boot FreeBSD.
sobomax [Mon, 19 Dec 2005 09:00:11 +0000 (09:00 +0000)]
Long-long time ago, when the trees were large and memory expensive amount of
memory directly available to loader(8) and friends was limited to 640K on i386.
Those times have passed long time ago and now loader(8) can directly access
up to 4GB of RAM at least theoretically. At the same time, there are several
places where it's assumed that malloc() will only allocate memory within
first megabyte.
Remove that assumption by allocating appropriate bounce buffers for BIOS
calls on stack where necessary.
This allows using memory above first megabyte for heap if necessary.
pjd [Mon, 19 Dec 2005 03:02:54 +0000 (03:02 +0000)]
- Document another spare flag (0x00000010).
- Add a 'XXX' comment about MNT_ACLS and MNT_BYFSID flags collision and
explain why it is harmless.
- Add a colon after 'XXX' for consistency.
bde [Mon, 19 Dec 2005 00:22:03 +0000 (00:22 +0000)]
Use a minimax polynomial approximation instead of a Pade rational
function approximation for the second step. The polynomial has degree
2 for cbrtf() and 4 for cbrt(). These degrees are minimal for the final
accuracy to be essentially the same as before (slightly smaller).
Adjust the rounding between steps 2 and 3 to match. Unfortunately,
for cbrt(), this breaks the claimed accuracy slightly although incorrect
rounding doesn't. Claim less accuracy since its not worth pessimizing
the polynomial or relying on exhaustive testing to get insignificantly
more accuracy.
This saves about 30 cycles on Athlons (mainly by avoiding 2 divisions)
so it gives an overall optimization in the 10-25% range (a larger
percentage for float precision, especially in 32-bit mode, since other
overheads are more dominant for double precision, surprisingly more
in 32-bit mode).
marcel [Mon, 19 Dec 2005 00:13:11 +0000 (00:13 +0000)]
Bump __FreeBSD_version to 700009 because:
1. The ELF-64 typedefs are now standardized, so that the libelf port
(devel/libelf) does not need to compensate for not having the
Elf64_Xword and Elf64_Sxword types.
2. ELF Symbol versioning support has been added. This also affects
the libelf port (though configure should detect this correctly).
bde [Sun, 18 Dec 2005 21:46:47 +0000 (21:46 +0000)]
Fixed code to match comments and the algorithm:
- in preparing for the third approximation, actually make t larger in
magnitude than cbrt(x). After chopping, t must be incremented by 2
ulps to make it larger, not 1 ulp since chopping can reduce it by
almost 1 ulp and it might already be up to half a different-sized-ulp
smaller than cbrt(x). I have not found any cases where this is
essential, but the think-time error bound depends on it. The relative
smallness of the different-sized-ulp limited the bug. If there are
cases where this is essential, then the final error bound would be
5/6+epsilon instead of of 4/6+epsilon ulps (still < 1).
- in preparing for the third approximation, round more carefully (but
still sloppily to avoid branches) so that the claimed error bound of
0.667 ulps is satisfied in all cases tested for cbrt() and remains
satisfied in all cases for cbrtf(). There isn't enough spare precision
for very sloppy rounding to work:
- in cbrt(), even with the inadequate increment, the actual error was
0.6685 in some cases, and correcting the increment increased this
a little. The fix uses sloppy rounding to 25 bits instead of very
sloppy rounding to 21 bits, and starts using uint64_t instead of 2
words for bit manipulation so that rounding more bits is not much
costly.
- in cbrtf(), the 0.667 bound was already satisfied even with the
inadequate increment, but change the code to almost match cbrt()
anyway. There is not enough spare precision in the Newton
approximation to double the inadequate increment without exceeding
the 0.667 bound, and no spare precision to avoid this problem as
in cbrt(). The fix is to round using an increment of 2 smaller-ulps
before chopping so that an increment of 1 ulp is enough. In cbrt(),
we essentially do the same, but move the chop point so that the
increment of 1 is not needed.
Fixed comments to match code:
- in cbrt(), the second approximation is good to 25 bits, not quite 26 bits.
- in cbrt(), don't claim that the second approximation may be implemented
in single precision. Single precision cannot handle the full exponent
range without minor but pessimal changes to renormalize, and although
single precision is enough, 25 bit precision is now claimed and used.
Added comments about some of the magic for the error bound 4/6+epsilon.
I still don't understand why it is 4/6+ and not 6/6+ ulps.
Indent comments at the right of code more consistently.
glebius [Sun, 18 Dec 2005 20:26:12 +0000 (20:26 +0000)]
Since BGE_MBX_TX_HOST_PROD0_LO register is write-only to software,
we can cache its value in the softc. Eliminates one PCI register
write per call to bge_start().
kan [Sun, 18 Dec 2005 19:43:33 +0000 (19:43 +0000)]
Implement ELF symbol versioning using GNU semantics. This code aims
to be compatible with symbol versioning support as implemented by
GNU libc and documented by http://people.redhat.com/~drepper/symbol-versioning
and LSB 3.0.
Implement dlvsym() function to allow lookups for a specific version of
a given symbol.
csjp [Sun, 18 Dec 2005 19:38:43 +0000 (19:38 +0000)]
Provide some basic documentation explaining what the bpf(4) flags are
supposed to mean. Also, add an external references for bpf now that we
reference flags from that man page.
glebius [Sun, 18 Dec 2005 18:24:27 +0000 (18:24 +0000)]
- Fix VLAN_INPUT_TAG() macro, so that it doesn't touch mtag in
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
incorrectly in almost all drivers. Indicate failure with
mbuf value of NULL.
marcel [Sun, 18 Dec 2005 04:52:37 +0000 (04:52 +0000)]
Make our ELF64 type definitions match standards. In particular this
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.
marcel [Sun, 18 Dec 2005 00:09:12 +0000 (00:09 +0000)]
Get in sync with current ELF definitions. In particular this means:
o Remove the unused and non-standard SHT_NUM, PT_COUNT and DT_COUNT.
o Add the STV_DEFAULT, STV_INTERNAL, STV_HIDDEN and STV_PROTECTED
symbol visibility constants.
o Add the ELF32_ST_VISIBILITY and ELF64_ST_VISIBILITY macros to
get the symbol visibility from the st_other field.
o Add the ELFOSABI_AIX, ELFOSABI_OPENVMS and ELFOSABI_NSK constants.
o Add the ET_LOOS, ET_HIOS, ET_LOPROC and ET_HIPROC constants.
o Further flesh out the list of machine types. Note that EM_ALPHA
remains non-standard. The standard value for EM_ALPHA is given
by EM_ALPHA_STD (which is a non-standard name :-)
o Add the SHN_LOOS, SHN_HIOS and SHN_XINDEX constants.
o Add the SHT_INIT_ARRAY, SHT_FINI_ARRAY, SHT_PREINIT_ARRAY, SHT_GROUP
and SHT_SYMTAB_SHNDX constants.
o Add the SHF_MERGE, SHF_STRINGS, SHF_INFO_LINK, SHF_LINK_ORDER,
SHF_OS_NONCONFORMING, SHF_GROUP and SHF_MASKOS constants.
o Add the PF_MASKOS and PF_MASKPROC constants.
o Add the STB_LOOS andf STB_HIOS constants.
o Add the STT_COMMON, STT_LOOS and STT_HIOS constants.
marcel [Sat, 17 Dec 2005 23:48:07 +0000 (23:48 +0000)]
Fix the ELF64_R_TYPE and ELF64_R_INFO macros. The symbol type is an
32-bit entity. Also, don't cast the resulting symbol type value to
a datatype smaller than the st_info field type as a quick way to
mask off the upper bits as it may cause inconsistent behaviour when
the macro is used (without explicit casting) on varargs functions.
thompsa [Sat, 17 Dec 2005 06:33:51 +0000 (06:33 +0000)]
Change from a callback in if_ethersubr to using EVENTHANDLER in order to detach
span ports when they disappear. The span port does not have a pointer to the
softc so revert r1.31 and bring back the softc linked-list.
njl [Sat, 17 Dec 2005 03:57:10 +0000 (03:57 +0000)]
Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB,
and KTR_IO as they were never used. Remove KTR_CLK since it was only
used for hardclock firing and use KTR_INTR there instead. Remove
KTR_CRITICAL since it was only used for crit enter/exit and use
KTR_CONTENTION instead.
ru [Fri, 16 Dec 2005 22:58:51 +0000 (22:58 +0000)]
Backout pseudo nForce2/3/4 support. These devices (as well as
AMD-8111 SMBus 2.0 controller) are all SMBus 2.0 controllers,
and need another implementation of SMBus access methods, while
this driver supports AMD-756 SMBus 1.0 controller and clones,
including AMD-8111 SMBus 1.0 controller.
Tested by: Vladimir Timofeev (0x006410de),
mezz (0x008410de),
ru (0x00d410de)
All of us got the same(!) nonsense when running ``mbmon -S'',
repeated every four rows.
jhb [Fri, 16 Dec 2005 22:11:52 +0000 (22:11 +0000)]
- Use uintfptr_t rather than int for the kernel profiling index (though it
really should be a fptrdiff_t if we had that) in profclock().
- Don't try to profile kernel pc's that are >= the kernel lowpc to avoid
underflows when computing a profiling index.
- Use the PC_TO_I() macro to compute the kernel profiling index rather than
doing it inline.
phk [Fri, 16 Dec 2005 18:56:39 +0000 (18:56 +0000)]
Add an extensible version of our *printf(3) implementation to libc
on probationary terms: it may go away again if it transpires it is
a bad idea.
This extensible printf version will only be used if either
environment variable USE_XPRINTF is defined
or
one of the extension functions are called.
or
the global variable __use_xprintf is set greater than zero.
In all other cases our traditional printf implementation will
be used.
The extensible version is slower than the default printf, mostly
because less opportunity for combining I/O operation exists when
faced with extensions. The default printf on the other hand
is a bad case of spaghetti code.
The extension API has a GLIBC compatible part and a FreeBSD version
of same. The FreeBSD version exists because the GLIBC version may
run afoul of our FILE * locking in multithreaded programs and it
even further eliminate the opportunities for combining I/O operations.
Include three demo extensions which can be enabled if desired: time
(%T), hexdump (%H) and strvis (%V).
%T can format time_t (%T), struct timeval (%lT) and struct timespec (%llT)
in one of two human readable duration formats:
"%.3llT" -> "20349.245"
"%#.3llT" -> "5h39m9.245"
%H will hexdump a sequence of bytes and takes a pointer and a length
argument. The width specifies number of bytes per line.
"%4H" -> "65 72 20 65"
"%+4H" -> "0000 65 72 20 65"
"%#4H" -> "65 72 20 65 |er e|"
"%+#4H" -> "0000 65 72 20 65 |er e|"
%V will dump a string in strvis format.
"%V" -> "Hello\tWor\377ld" (C-style)
"%0V" -> "Hello\011Wor\377ld" (octal)
"%+V" -> "Hello%09Wor%FFld" (http-style)
alc [Fri, 16 Dec 2005 18:34:14 +0000 (18:34 +0000)]
Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the
ephemeral mappings that are used as the source for three copy
operations from kernel space to user space. There are two reasons for
making this change: (1) Under heavy load exec_map can fill up causing
vm_map_find() to fail. When it fails, the nascent process is aborted
(SIGABRT). Whereas, this reimplementation using sf_buf_alloc()
sleeps. (2) Although it is possible to sleep on vm_map_find()'s
failure until address space becomes available (see kmem_alloc_wait()),
using sf_buf_alloc() is faster. Furthermore, the reimplementation
uses a CPU private mapping, avoiding a TLB shootdown on
multiprocessors.
ru [Fri, 16 Dec 2005 15:03:16 +0000 (15:03 +0000)]
Fix PCI ID of the AMD-8111 System Management controller so it matches
SMBus 1.0 and not SMBus 2.0.
AMD-8111 hub (datasheet is publically available) implements both SMBus
2.0 (a separate PCI device) and SMBus 1.0 (a subfunction of the System
Management Controller device with the base I/O address is accessible
through the CSR 0x58). This driver only supports AMD-756 SMBus 1.0
compatible devices.
With the patched sysutils/xmbmon port (to also fix PCI ID and to enable
smb(4) support), I now get:
pciconf:
none0@pci0:7:2: class=0x0c0500 card=0x746a1022 chip=0x746a1022 rev=0x02 hdr=0x00
vendor = 'Advanced Micro Devices (AMD)'
device = 'AMD-8111 SMBus 2.0 Controller'
class = serial bus
subclass = SMBus
amdpm0@pci0:7:3: class=0x068000 card=0x746b1022 chip=0x746b1022 rev=0x05 hdr=0x00
vendor = 'Advanced Micro Devices (AMD)'
device = 'AMD-8111 ACPI System Management Controller'
class = bridge
dmesg:
amdpm0: <AMD 756/766/768/8111 Power Management Controller> port 0x10e0-0x10ff at device 7.3 on pci0
smbus0: <System Management Bus> on amdpm0
# mbmon -A -d
Summary of Detection:
* SMB monitor(s)[ioctl:AMD8111]:
** Winbond Chip W83627HF/THF/THF-A found at slave address: 0x50.
** Analog Dev. Chip ADM1027 found at slave address: 0x5C.
* ISA monitor(s):
** Winbond Chip W83627HF/THF/THF-A found.
I think the confusion comes from the fact that nobody really tried
SMBus with xmbmon :-), since sysutils/xmbmon port doesn't come with
SMBus support enabled, neither in FreeBSD 4, nor in later versions,
so mbmon(1) was just showing the values from the Winbond sensors
accessible through the ISA I/O method (mbmon -I), for me anyway.
On my test machine, the amdpm(4) didn't even attach due to I/O port
allocation failure (who knows what the hell it read from CSR 0x58
of the SMBus 2.0 device :-), which isn't in the CSR space).
I've also checked that lm_sensors.org uses correct PCI ID for SMBus
1.0 of AMD-8111:
This driver is analogous to nForce-2/3/4, i2c-nforce2.c, which
supports SMBus 2.0, and which our amdpm.c does NOT support
(SMBus 2.0 uses a different, ACPI-unified, API to talk to SMBus).
At least I know for sure it doesn't work with my nForce3. :-)
(The xmbmon port will be fixed to correct the PCI ID too and to
enable the smb(4) support.)
ps [Fri, 16 Dec 2005 06:50:55 +0000 (06:50 +0000)]
It seems ciss should ignore overrun and underrun on a SCSI INQUIRY
command. This fixes some weird booting issues on newer versions
of the firmware on the MSA20.
Reported by: Philippe Pegon <Philippe dot Pegon at crc dot u-strasbg dot fr>
scottl [Fri, 16 Dec 2005 05:57:18 +0000 (05:57 +0000)]
Don peril sensitive sunglasses and jack up the MAX_BPAGES limit to 8192
on amd64. If you're going to stuff >4GB into your box, reserving 32MB for
bonce pages amounts to a rounding error in the overall scheme of things.
davidxu [Fri, 16 Dec 2005 02:50:53 +0000 (02:50 +0000)]
With current pthread implementations, a mutex initialization will
allocate a memory block. sscanf calls __svfscanf which in turn calls
fread, fread triggers mutex initialization but the mutex is not
destroyed in sscanf, this leads to memory leak. To avoid the memory
leak and performance issue, we create a none MT-safe version of fread:
__fread, and instead let __svfscanf call __fread.
PR: threads/90392
Patch submitted by: dhartmei
MFC after: 7 days
jhb [Thu, 15 Dec 2005 16:30:41 +0000 (16:30 +0000)]
Remove linux_mib_destroy() (which I actually added in between 5.0 and 5.1)
which existed to cleanup the linux_osname mutex. Now that MTX_SYSINIT()
has grown a SYSUNINIT to destroy mutexes on unload, the extra destroy here
was redundant and resulted in panics in debug kernels.
MFC after: 1 week
Reported by: Goran Gajic ggajic at afrodita dot rcub dot bg dot ac dot yu
bde [Thu, 15 Dec 2005 16:23:22 +0000 (16:23 +0000)]
Added comments about the apparently-magic rational function used in
the second step of approximating cbrt(x). It turns out to be neither
very magic not nor very good. It is just the (2,2) Pade approximation
to 1/cbrt(r) at r = 1, arranged in a strange way to use fewer operations
at a cost of replacing 4 multiplications by 1 division, which is an
especially bad tradeoff on machines where some of the multiplications
can be done in parallel. A Remez rational approximation would give
at least 2 more bits of accuracy, but the (2,2) Pade approximation
already gives 6 more bits than needed. (Changed the comment which
essentially says that it gives 3 more bits.)
Lower order Pade approximations are not quite accurate enough for
double precision but are plenty for float precision. A lower order
Remez rational approximation might be enough for double precision too.
However, rational approximations inherently require an extra division,
and polynomial approximations work well for 1/cbrt(r) at r = 1, so I
plan to switch to using the latter. There are some technical
complications that tend to cost a division in another way.
glebius [Thu, 15 Dec 2005 09:45:53 +0000 (09:45 +0000)]
o Rewrite bge_encap() to use bus_dmamap_load_mbuf_sg(9), inlining the
callback function bge_dma_map_tx_desc() into the bge_encap() itself.
o If busdma returns EFBIG, try to m_defrag() the packet.
yongari [Thu, 15 Dec 2005 05:48:49 +0000 (05:48 +0000)]
Add bge(4) support for big-endian architectures(part 1/2).
- Give up endianess support and switch to native-endian format for
accessing hardware structures. In fact embedded processor for
BCM57xx is big-endian architure(MIPS) and it requires native-endian
format for NIC structures.The NIC performs necessary byte/word
swapping depending on programmed endian type.
- With above changes all htole16/htole32 calls were gone.
- Remove bge_vhandle member in softc and changed to use explicit
register access. This may add additional performance penalty
that than that of previous memory access. But most of the access
is performed on initialization phase(e.g. RCB setup), it would be
negligible.
Due to incorrect use of bus_dma(9) in bge(4) it still panics sparc64
system in device detach path. The issue would be fixed in next patch.
emaste [Wed, 14 Dec 2005 23:34:26 +0000 (23:34 +0000)]
When using m_dup(9) to copy more than MHLEN bytes of data, don't create an
mbuf chain that starts with a cluster containing just MHLEN bytes. This
happened because m_dup called m_get or m_getcl depending on the amount of
data to copy, but then always set the size available in the first mbuf to
MHLEN.
Submitted by: Matt Koivisto <mkoivisto at sandvine dot com>
Approved by: jmg
Silence from: rwatson (mentor)