marius [Tue, 27 Apr 2010 18:41:16 +0000 (18:41 +0000)]
On sparc64 obtain the initiator ID to be used for SPI HBAs from the
Open Firmware device tree in order to match what the PROM built-in
driver uses. This is especially important when netbooting Fujitsu
Siemens PRIMEPOWER250 as in that case the built-in driver isn't used
and the port facts PortSCSIID defaults to 0, conflicting with the
disk at the same address.
marius [Tue, 27 Apr 2010 18:05:33 +0000 (18:05 +0000)]
- On sparc64 obtain the initiator ID from the Open Firmware device tree
in order to match what the PROM built-in driver uses.
- Remove some no longer used includes.
Use _exit(2) system call directly instead of using exit(3) in signal
handler, as the latter is not guaranteed to be signal safe, and we
do not really care about flushing the stream during SIGINT.
Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.
This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.
Make sure IPv6 source address selection does not change interface
addresses while walking the IPv6 address list if in the jail case
something is connecting to ::1.
Reported by: Pieter de Boer (pieter thedarkside.nl)
Tested by: Pieter de Boer (pieter thedarkside.nl)
MFC after: 4 days
Fix a regression where DVMRP diagnostic traffic, such as that used
by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control
traffic must always have a TTL of 1.
Require the option that's mapped be listed in the options file. This
will allow people with old config options to either have it just work
(if config is new enough), or get a version error (if their config is
about 7.0 or newer) rather than getting a cryptic error about
duplicated options in the options file, or getting an error about an
unknown option, at which point they'd update their config file only to
learn they need a new config, only to learn they didn't really need to
update their config file... All this because our version checking was
in the wrong place for the past decade...
# hopefully this is the last change, and we'll be able to config with an
# 8.0 GENERIC file on stable/8 after I merge this change and add the
# compat options.
Redo how we add compat options so as to be compatible with old
versions of config. Remove support for the syntax OLD = NEW form the
options file, and instead have a new file $S/conf/options-compat.
This file will be parsed as OLD NEW on each line. Bump version of
config. Since nothing in -current ever used this, there's no hazards
for current users, so I'm not bumping the version in the
Makefiles.$MACHINE. No need, really, for this version bump in
-current, but this was introduced into -stable before I realized the
version check was ineffective there, so the verison bump doesn't hurt
here and keeps the two branches in sync, versionwise, after the MFC.
MFi386 r207205
Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified,
so don't do it.
Move checking the version up from Makefile generation to just after
we've parsed the config file. Makefile generation is too late if
we've introduce changes to the syntax of the metafiles to warn about
version skew, since we have to try to parse them and we get an parse
error that's rather baffling to the user rather than a 'your config is
too old, upgrade' which we should get.
We have to defer doing it until after we've read the user's config
file because we define machinename there. The version required to
compile the kernel is encoded in Makefile.machinename. There's no
real reason for this to be the case, but changing it now would
introduce some logistical issues that I'd rather avoid for the moment.
I intend to revisit this if we're still using config in FreeBSD 10.
This also means that we cannot introduce any config metafile changes
that result in a syntax error or other error for the user until 9.0 is
released. Otherwise, we break the upgrade path, or at least reduce
the usefulness of the error messages we generate.
# This implies that the config file option mapping will need to be redone.
It seems ale(4) controllers do not like to see TCP payload in the
first descriptor in TSO case. Otherwise controller can generate bad
frames during TSO. To address it, make sure to pull up ethernet +
IP + TCP header with options in first buffer. Also ensure the
buffer length of the first descriptor for TSO covers entire ethernet
+ IP + TCP with options and setup additional Tx descriptor if the
first buffer includes TCP payload.
marius [Mon, 26 Apr 2010 20:19:49 +0000 (20:19 +0000)]
Don't bother enabling interrupts before we're ready to handle them. This
prevents the firmware of Fujitsu Siemens PRIMEPOWER250, which both causes
stray interrupts and erroneously enables interrupts at least when calling
SUNW,set-trap-table, in the foot.
marius [Mon, 26 Apr 2010 19:13:10 +0000 (19:13 +0000)]
Add OF_getscsinitid(), a helper similar to OF_getetheraddr() but for
obtaining the initiator ID to be used for SPI controllers from the
Open Firmware device tree.
Better handling of ipv6_default_interface using
net.inet6.ip6.use_defaultzone=1. Now, it works IPv6 link-local
unicast addresses as well as IPv6 link-local multicast addresses.
- Rework the underlying ALQ storage to be a circular buffer, which amongst other
things allows variable length messages to be easily supported.
- Extend KPI with alq_writen() and alq_getn() to support variable length
messages, which is enabled at ALQ creation time depending on the
arguments passed to alq_open(). Also add variants of alq_open() and
alq_post() that accept a flags argument. The KPI is still fully
backwards compatible and shouldn't require any change in ALQ consumers
unless they wish to utilise the new features.
- Introduce the ALQ_NOACTIVATE and ALQ_ORDERED flags to allow ALQ consumers
to have more control over IO scheduling and resource acquisition
respectively.
Incremental reduction of delta with head_page_lock_2 branch
- replace modification of pmap resident_count with pmap_resident_count_{inc,dec}
- the pv list is protected by the pmap lock, but in several cases we are relying
on the vm page queue mutex, move pv_va read under the pmap lock
Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation. Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection. This could, by itself, be
fixed, but I don't see the point given the above argument.
Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.
Make hash, type and ulimit available via execve().
These are specified by POSIX but are not special builtins, and therefore
need to be available via execve() and utilities like time, nohup, xargs.
(Note that hash was moved from the XSI option to the base in the 2008
standard.)
Like most of the POSIX "regular builtin commands", these need to be executed
in a shell environment for full functionality, although they may still be of
some use outside one.
Unlike the POSIX special and regular builtin commands, POSIX does not
require these to be found before a PATH search, although that could be an
oversight.
Like some of the utilities already provided by usr.bin/alias, these may lead
to confusing results when invoked from csh(1).
Provide compat32 shims for bpf(4), except zero-copy facilities.
bd_compat32 field of struct bpf_d is kept unconditionally to not
impose the requirement of including "opt_compat.h" on all numerous
users of bpfdesc.h.
Submitted by: jhb (version for 6.x)
Reviewed and tested by: emaste
MFC after: 2 weeks
symlink(7): The ownership of symlinks is used by the system,
in at least three ways, so do not say it is ignored:
* who may delete/rename a symlink in a sticky directory
* who may do lchflags(2)/lchown(2)/lchmod(2)
* whose inode quota is charged
kvm(3): Mention that some of the functions use sysctl(3) instead of kmem.
Additionally, because of sysctl(3) use (which is generally good), behaviour
for crash dumps differs slightly from behaviour for live kernels and this
will probably never be fixed entirely, so weaken that claim.
sysctl(3): Update description of various kern.* variables.
Also add xrefs for confstr(3) (as sysconf(3) but for strings) and kvm(3)
(which is a more convenient way to access some of the variables).
Fix undo for schemes that have internal partitions. Internal partitions
do not constitute user-visible or active partitions and as such should
not prevent undoing pending operations.
While here, initialize the last usable sector for the placeholder geom
based on the null scheme, created to allow undoing the destruction of
a scheme. This gives consistent output with "gpart show".
Based on a patch from: "Andrey V. Elsukov" <bu7cher@yandex.ru>
An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.
Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.
marius [Sat, 24 Apr 2010 12:11:41 +0000 (12:11 +0000)]
Add a TestFloat based test suite for floating-point implementations
currently supporting sparc64. After a `make depend all` there are
three programs; testsoftfloat for testing against the SoftFloat in
src/lib/libc/softfloat for reference purposes, testemufloat for
testing the emulator source in src/lib/libc/sparc64/fpu and testfloat
for testing with the installed libc. Support for other architectures
can be added as needed.
jeff [Sat, 24 Apr 2010 07:05:35 +0000 (07:05 +0000)]
- Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
for background fsck on unclean shutdown.
Sponsored by: iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
o) Remove default MAXMEM on SWARM; pmap can readily use lmem for >512M
physical addresses.
o) Set a local maxmem in sb_machdep.c to avoid trying to use pages over 2^64
under 32-bit ABIs. Our pmap needs corrected to use vm_paddr_t consistently,
then we can make vm_paddr_t 64-bit under 32-bit ABIs and add code in pmap
to limit phys_avail by the maximum PFN that a 32-bit PTE can hold.
Large memory mappings are always CPU local and always done with interrupts
disabled. Be doubly-sure that we don't try to do a TLB shootdown on SMP
systems for those mappings.
Address some WITNESS panics that occur when using the via driver.
Some of these cases should be safe in a non-atomic fashion, however
since all of the driver ioctls are locked, a lot of work is required to
fix it correctly. Just don't sleep now.
- Take libinstall.a out of pkg_install and make it a proper shared library.
- Rework the wrapper support to check libpkg version as well as pkg_install
version.
- Add libfetch to _prebuild_libs.
- There are no new features introduced.
Notes: the API is not stable, so basically, do not use libpkg in your
projects for now. Also there's no manpage for libpkg yet, because the API
will change drastically. I repeat, do not use libpkg for now.
* Fix compilation when using SCTP_AUDITING_ENABLED.
* Fix delaying of SACK by taking out old optimization code
which does not optimize anymore.
* Fix fast retransmission of chunks abandoned by the
"number of retransmissions" policy.
Implement the resize command for resizing partitions. Without new
size, the partition in question is resized to fill all available
space. Quality work by Andrey!
Submitted by: "Andrey V. Elsukov" <bu7cher@yandex.ru>
When the experimental NFS client is handling an NFSv4 server reboot
with delegations enabled, the recovery could fail if the renew
thread is trying to return a delegation, since it will not do the
recovery. This patch fixes the above by having nfscl_recalldeleg()
fail with the I/O operations returning EIO, so that they will be
attempted later. Most of the patch consists of adding an argument
to various functions to indicate the delegation recall case where
this needs to be done.
Intialize interrupt moderation control register. The magic value
was chosen by lots of trial and errors. The chosen value shows
good interrupt moderation without additional latency.
Without this change, controller can generate more than 140k
interrupts per second under high network load.
This time, abandon the use of busdma and start interacting with the VM
system directly. Make use of the new kmem_alloc_attr() which allows us
to easily allocate non-contiguous pages to back the GART table. This
should help a lot when starting or restarting X after the system has
been running for a while and memory has become fragmented.
Remove explicit setting of NO_CTF in WMAKEENV and in the make call for
the buildkernel. This way makeoptions WITH_CTF=yes not only works when
compiling the tradtitional way, but also when using buildkernel. This
does not enable the CTF part of the world, it still defaults to without
CTF info.
The cross/build-tools/bootstrap targets are not affected by this, they
still have and should keep the explicit NO_CTF.
Revert r206179 (by imp) and do something similar which is more consistent
with all other corresponding CTF places by changing the corresponding
code which is generated by config(8). Or in short, move the '@' from
the variable definition to the use of the variable. [1]
ln: Allow a trailing slash when creating a link to a directory.
In the 'ln source... directory' synopsis, the basename of each source
determines the name of the created link. Determine this using basename(3)
instead of strrchr(..., '/') which is incorrect if the pathname ends in a
slash.
The patch is somewhat changed to allow for basename(3) implementations that
change the passed pathname, and to fix the -w option's checking also.
The code to compare directory entries only applies to hard links, which
cannot be created to directories using ln.
Example:
ln -s /etc/defaults/ /tmp
This should create a symlink named defaults.
Change usb devd events from fake attach to a notify. The ugen device is not a
proper device_t so it faked the devctl event to appear like one, this is now a
notify which allows more information to be passed.
We notify for both the device attach/detach and for each usb interface. A devd
rule can now match on the interface properties, including composite devices
which may have a uvideo interface and also usound and possibly uhid too.
An example to match a umass device with a scsi subclass and BBB protocol would be
notify 100 {
match "system" "USB";
match "subsystem" "INTERFACE";
match "type" "ATTACH";
match "intclass" "0x08";
match "intsubclass" "0x06";
match "intprotocol" "0x50";
action ...
};
The old attach devctl event has been retained for the moment to make merging to
8.1 easier. This was never compatible with 7.x or earlier due to the ugen regex
change needed.