rwatson [Sun, 2 Mar 2008 22:52:14 +0000 (22:52 +0000)]
Don't auto-start or allow extattrctl for UFS2 file systems, as UFS2 has
native extended attributes. This didn't interfere with the operation of
UFS2 extended attributes, but the code shouldn't be running for UFS2.
rwatson [Sun, 2 Mar 2008 21:34:17 +0000 (21:34 +0000)]
Rather than copying out the full audit trigger record, which includes
a queue entry field, just copy out the unsigned int that is the trigger
message. In practice, auditd always requested sizeof(unsigned int), so
the extra bytes were ignored, but copying them out was not the intent.
raj [Sun, 2 Mar 2008 17:05:57 +0000 (17:05 +0000)]
Unify and generalize PowerPC headers, adjust AIM code accordingly.
Rework of this area is a pre-requirement for importing e500 support (and
other PowerPC core variations in the future). Mainly the following
headers are refactored so that we can cover for low-level differences between
various machines within PowerPC architecture:
Some "cleanup" of tcp_mss():
- Move the assigment of the socket down before we first need it.
No need to do it at the beginning and then drop out the function
by one of the returns before using it 100 lines further down.
- Use t_maxopd which was assigned the "tcp_mssdflt" for the corrrect
AF already instead of another #ifdef ? : #endif block doing the same.
- Remove an unneeded (duplicate) assignment of mss to t_maxseg just before
we possibly change mss and re-do the assignment without using t_maxseg
in between.
Reviewed by: silby
No objections: net@ (silence)
MFC after: 5 days
jeff [Sun, 2 Mar 2008 08:20:59 +0000 (08:20 +0000)]
Add support for the new cpu topology api:
- When searching for affinity search backwards in the tree from the last
cpu we ran on while the thread still has affinity for the group. This
can take advantage of knowledge of shared L2 or L3 caches among a
group of cores.
- When searching for the least loaded cpu find the least loaded cpu via
the least loaded path through the tree. This load balances system bus
links, individual cache levels, and hyper-threaded/SMT cores.
- Make the periodic balancer recursively balance the highest and lowest
loaded cpu across each link.
Add support for cpusets:
- Convert the cpuset to a simple native cpumask_t while the kernel still
only supports cpumask.
- Pass the derived cpumask down through the cpu_search functions to
restrict the result cpus.
- Make the various steal functions resilient to failure since all threads
can not run on all cpus any longer.
General improvements:
- Precisely track the lowest priority thread on every runq with
tdq_setlowpri(). Before it was more advisory but this ended up having
pathological behaviors.
- Remove many #ifdef SMP conditions to simplify the code.
- Get rid of the old cumbersome tdq_group. This is more naturally
expressed via the cpu_group tree.
jeff [Sun, 2 Mar 2008 07:58:42 +0000 (07:58 +0000)]
- Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.
jeff [Sun, 2 Mar 2008 07:51:29 +0000 (07:51 +0000)]
Add a simple utility for manipulating cpusets. Man page will be available
soon.
- Lists of cpus may be specified with -l with ranges specified as low-high and
commas between individual cpus and ranges. ie -l 0-2,4,6-8.
- cpuset can modified -p pids, -t tids, or -s cpusetids.
- cpuset can -g get the current mask for any of the above.
jeff [Sun, 2 Mar 2008 07:39:22 +0000 (07:39 +0000)]
Add cpuset, an api for thread to cpu binding and cpu resource grouping
and assignment.
- Add a reference to a struct cpuset in each thread that is inherited from
the thread that created it.
- Release the reference when the thread is destroyed.
- Add prototypes for syscalls and macros for manipulating cpusets in
sys/cpuset.h
- Add syscalls to create, get, and set new numbered cpusets:
cpuset(), cpuset_{get,set}id()
- Add syscalls for getting and setting affinity masks for cpusets or
individual threads: cpuid_{get,set}affinity()
- Add types for the 'level' and 'which' parameters for the cpuset. This
will permit expansion of the api to cover cpu masks for other objects
identifiable with an id_t integer. For example, IRQs and Jails may be
coming soon.
- The root set 0 contains all valid cpus. All thread initially belong to
cpuset 1. This permits migrating all threads off of certain cpus to
reserve them for special applications.
Sponsored by: Nokia
Discussed with: arch, rwatson, brooks, davidxu, deischen
Reviewed by: antoine
kaiw [Sun, 2 Mar 2008 07:01:01 +0000 (07:01 +0000)]
- Do not malloc buffer for 0-size member when reading from archive.
- Fix a malloc buffer overrun: Use a while loop to check whether
the string buffer is big enough after resizing, since doubling
once might not be enough when a very long member name or symbol
name is provided.
- Fix typo.
Reported by: Michael Plass <mfp49_freebsd@plass-family.net>
Tested by: Michael Plass <mfp49_freebsd@plass-family.net>
Reviewed by: jkoshy
Approved by: jkoshy
marcel [Sun, 2 Mar 2008 00:52:49 +0000 (00:52 +0000)]
Add support for VTOC8 labels (aka sun disk labels). When a label does
not have VTOC information about the partitions, it will be created.
This is because the VTOC information is used for the partition type
and FreeBSD's sunlabel(8) does not create nor use VTOC information.
For this purpose, new tags have been added to support FreeBSD's
partition types.
marcel [Sat, 1 Mar 2008 22:54:42 +0000 (22:54 +0000)]
Make the vm_pmap field of struct vmspace the last field in the
structure. This allows per-CPU variations of struct pmap on a
single architecture without affecting the machine-independent
fields. As such, the PMAP variations don't affect the ABI. They
become part of it.
attilio [Sat, 1 Mar 2008 22:14:45 +0000 (22:14 +0000)]
Split the kernel / userland interface with propert _KERNEL stub.
This should have been always there, but an userland brekage for the
recent lockmgr modifies showed it.
gibbs [Sat, 1 Mar 2008 21:58:34 +0000 (21:58 +0000)]
In est_acpi_info(), initialize count before passing its pointer to
CPUFREQ_DRV_SETTINGS(). The value of count on input is used to
prefent overflow of the settings buffer passed into CPUFREQ_DRV_SETTINGS().
This corrects the "est: CPU supports Enhanced Speedstep, but is not recognized."
error on my system.
attilio [Sat, 1 Mar 2008 20:05:20 +0000 (20:05 +0000)]
Update lockmgr manpage with last lockmgr modifies:
- Remove LK_SLEEPFAIL and LK_NOWAIT for lockinit() and add LK_QUIET and
LK_NOPROFILE
- Include sys/lock.h as mandatory for the lockmgr support
attilio [Sat, 1 Mar 2008 19:53:26 +0000 (19:53 +0000)]
Bump __FreeBSD_version in order to reflect:
- lockwaiters() axing out
- BUF_LOCKWAITERS() axing out
- brelvp() prototype changing
- lockinit() accepted arguments() range changing
attilio [Sat, 1 Mar 2008 19:47:50 +0000 (19:47 +0000)]
- Handle buffer lock waiters count directly in the buffer cache instead
than rely on the lockmgr support [1]:
* bump the waiters only if the interlock is held
* let brelvp() return the waiters count
* rely on brelvp() instead than BUF_LOCKWAITERS() in order to check
for the waiters number
- Remove a namespace pollution introduced recently with lockmgr.h
including lock.h by including lock.h directly in the consumers and
making it mandatory for using lockmgr.
- Modify flags accepted by lockinit():
* introduce LK_NOPROFILE which disables lock profiling for the
specified lockmgr
* introduce LK_QUIET which disables ktr tracing for the specified
lockmgr [2]
* disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it
can only be used on a per-instance basis
- Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer
used
This patch breaks KPI so __FreBSD_version will be bumped and manpages
updated by further commits. Additively, 'struct buf' changes results in
a disturbed ABI also.
[2] Really, currently there is no ktr tracing in the lockmgr, but it
will be added soon.
[1] Submitted by: kib
Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
scf [Sat, 1 Mar 2008 00:02:12 +0000 (00:02 +0000)]
Remove a dereference. It was unintended and a no-op.
Use the correct value of errno. Although the errno value passed into
printf() follows the *env() call, it is not guaranteed to be the errno
from that call. When I wrote the regression tester, the environment I
used did pass the errno from the call. Consolidate the print for the
return code and errno into a function in the process of fixing this.
marcel [Fri, 29 Feb 2008 22:41:36 +0000 (22:41 +0000)]
Follow-up improvements to the handling of false positives: If the
partition table is empty, check to see if we have something that
looks sufficiently like a BPB. On non-i386 machines, the boot
sector typically doesn't contain boot code; the end of the boot
sector is all zeroes. This is also where the partition table is
for MBRs.
We only check the sector size and cluster size, as that seems to
be the most reliable across implementations, BPB versions and
platforms.
jfv [Fri, 29 Feb 2008 21:50:11 +0000 (21:50 +0000)]
This change introduces a split to the Intel E1000 driver, now rather than
just em, there is an igb driver (this follows behavior with our Linux drivers).
All adapters up to the 82575 are supported in em, and new client/desktop support
will continue to be in that adapter.
The igb driver is for new server NICs like the 82575 and its followons.
Advanced features for virtualization and performance will be in this driver.
Also, both drivers now have shared code that is up to the latest we have
released. Some stylistic changes as well.
jhb [Fri, 29 Feb 2008 19:18:09 +0000 (19:18 +0000)]
With the recent change to enable CPU brands from the VIA chips, the
code to add padlock features to the CPU model on VIA CPUs was no longer
effective. Change the code to instead output a separate printf during
dmesg for VIA Padlock features similar to other cpuid feature bitmasks.
fanf [Fri, 29 Feb 2008 12:57:14 +0000 (12:57 +0000)]
Allow #if defined SYM as well as #if defined(SYM). Fix an abort
caused by files that have #endif and no newline on the last line
(reported by joe@). Also fix a benign uninitialized variable bug.
Update and tidy the copyright.
nyan [Fri, 29 Feb 2008 05:01:10 +0000 (05:01 +0000)]
MFi386: revision 1.658
Add "show sysregs" command to ddb. On i386, this gives gdt, idt, ldt,
cr0-4, etc. Support should be added for other platforms that have a
different set of registers for system use.
sam [Fri, 29 Feb 2008 04:07:07 +0000 (04:07 +0000)]
Fix adhoc mode to scan all available channels for a bss to join
while still restricting auto-channel select to only those channels
permitted by regulatory constraints (sorta, we're still missing the
checks to honor radar and noadhoc status on channels). This somehow
got lost in the initial merge of the revised scanning code.
yongari [Fri, 29 Feb 2008 03:38:12 +0000 (03:38 +0000)]
Workaround GMAC hardware hang of Yukon II on the receipt of pause
frames. This bug seems to happen on certain hardware model/revision
(e.g. 88E8053) but it's not identified which hardwares are affected.
Revision 1.4 of if_mskreg.h was not enough to workaround the bug.
To workaround it, inrease GMAC FIFO threshold by one FIFO word to
flush received pause frames.
marcel [Thu, 28 Feb 2008 22:30:41 +0000 (22:30 +0000)]
Better handle false positives. The MBR differs from the boot sector
only because there's a partition table where the boot sector has
boot code. Boot sectors without boot code look like a MBR for all
practical purposes. This change adds a check for the partition table
and fails the probe when it's obvously invalid. The assumption being
that the sector contains a boot sector and not a MBR.
More checks are needed to distinguish a boot secto without boot code
from a (empty) MBR.
jhb [Thu, 28 Feb 2008 17:59:54 +0000 (17:59 +0000)]
- Check for the extended CPUID registers on VIA CPUs so we can get the
brand string.
- Fix a nit in the previous commit. "Eden" is a product name, not a core
name. The new ID is still for an "Esther" core.
jhb [Thu, 28 Feb 2008 17:49:23 +0000 (17:49 +0000)]
Tweak the verbose disk printing a bit:
- Consolidate the code to humanize the size of a disk partition into a
single function based on the code for GPT partitions and use it for
GPT partitions, BSD slices, and BSD partitions.
- Teach the humanize code to use KB for small partitions (e.g. GPT boot
partitions now show up as 64KB rather than 0MB).
- Pad a few partition type names out so that things line up in the
common case.
jhb [Thu, 28 Feb 2008 17:08:05 +0000 (17:08 +0000)]
Rev 1.72 fixed a bug where if /boot.config changed the console its contents
weren't displayed on the new console. However, the config string has been
altered as part of being parsed so we only display the first option. Fix
this by saving a copy of /boot.config before parsing it and displaying the
saved copy after parsing.
bde [Thu, 28 Feb 2008 16:22:36 +0000 (16:22 +0000)]
Fix and improve some magic numbers for the "medium size" case.
e_rem_pio2.c:
This case goes up to about 2**20pi/2, but the comment about it said that
it goes up to about 2**19pi/2.
It went too far above 2**pi/2, giving a multiplier fn with 21 significant
bits in some cases. This would be harmful except for a numerical
accident. It happens that the terms of the approximation to pi/2,
when rounded to 33 bits so that multiplications by 20-bit fn's are
exact, happen to be rounded to 32 bits so multiplications by 21-bit
fn's are exact too, so the bug only complicates the error analysis (we
might lose a bit of accuracy but have bits to spare).
e_rem_pio2f.c:
The bogus comment in e_rem_pio2.c was copied and the code was changed
to be bug-for-bug compatible with it, except the limit was made 90
ulps smaller than necessary. The approximation to pi/2 was not
modified except for discarding some of it.
The same rough error analysis that justifies the limit of 2**20pi/2
for double precision only justifies a limit of 2**18pi/2 for float
precision. We depended on exhaustive testing to check the magic numbers
for float precision. More exaustive testing shows that we can go up
to 2**28pi/2 using a 53+25 bit approximation to pi/2 for float precision,
with a the maximum error for cosf() and sinf() unchanged at 0.5009
ulps despite the maximum error in rem_pio2f being ~0.25 ulps. Implement
this.
scf [Thu, 28 Feb 2008 04:09:08 +0000 (04:09 +0000)]
Replace the use of warnx() with direct output to stderr using _write().
This reduces the size of a statically-linked binary by approximately 100KB
in a trivial "return (0)" test application. readelf -S was used to verify
that the .text section was reduced and that using strlen() saved a few
more bytes over using sizeof(). Since the section of code is only called
when environ is corrupt (program bug), I went with fewer bytes over fewer
cycles.
I made minor edits to the submitted patch to make the output resemble
warnx().
jhb [Wed, 27 Feb 2008 19:02:02 +0000 (19:02 +0000)]
File descriptors are an int, but our stdio FILE object uses a short to hold
them. Thus, any fd whose value is greater than SHRT_MAX is handled
incorrectly (the short value is sign-extended when converted to an int).
An unpleasant side effect is that if fopen() opens a file and gets a
backing fd that is greater than SHRT_MAX, fclose() will fail and the file
descriptor will be leaked. Better handle this by fixing fopen(), fdopen(),
and freopen() to fail attempts to use a fd greater than SHRT_MAX with
EMFILE.
At some point in the future we should look at expanding the file descriptor
in FILE to an int, but that is a bit complicated due to ABI issues.
MFC after: 1 week
Discussed on: arch
Reviewed by: wollman
rwatson [Wed, 27 Feb 2008 17:12:22 +0000 (17:12 +0000)]
Replace somewhat awkward audit trail rotation scheme, which involved the
global audit mutex and condition variables, with an sx lock which protects
the trail vnode and credential while in use, and is acquired by the system
call code when rotating the trail. Previously, a "message" would be sent
to the kernel audit worker, which did the rotation, but the new code is
simpler and (hopefully) less error-prone.
dwmalone [Wed, 27 Feb 2008 13:52:33 +0000 (13:52 +0000)]
Dummynet has a limit of 100 slots queue size (or 1MB, if you give
the limit in bytes) hard coded into both the kernel and userland.
Make both these limits a sysctl, so it is easy to change the limit.
If the userland part of ipfw finds that the sysctls don't exist,
it will just fall back to the traditional limits.
(100 packets is quite a small limit these days. If you want to test
TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.)
Note these sysctls in the man page and warn against increasing them
without thinking first.
scottl [Wed, 27 Feb 2008 08:47:13 +0000 (08:47 +0000)]
When probing a newly found device, don't automatically assume that the
device supports retrieving a serial number. Instead, first query the
list of VPD pages it does support, and only query the serial number if
it's supported, else silently move on. This eliminates a lot of noise
during verbose booting, and will likely eliminate the need for most
NOSERIAL quirks.