When amd64 CPU cannot load segment descriptor during trap return to
usermode, it generates GPF, that is mirrored to user mode as SIGSEGV.
The offending register in mcontext should contain the value loading of
which generated the GPF, and it is so on i386. On amd64, we currently
report segment descriptor in tf_err, while segment register contains the
corrected value loaded by trap handler.
Fix the issue by behaving like i386, reloading segment register in trap
frame after signal frame is pushed onto user stack.
Noted and tested by: pho
Approved by: re (kensmith)
Separate the parallel scsi knowledge out of the core of the XPT, and
modularize it so that new transports can be created.
Add a transport for SATA
Add a periph+protocol layer for ATA
Add a driver for AHCI-compliant hardware.
Add a maxio field to CAM so that drivers can advertise their max
I/O capability. Modify various drivers so that they are insulated
from the value of MAXPHYS.
The new ATA/SATA code supports AHCI-compliant hardware, and will override
the classic ATA driver if it is loaded as a module at boot time or compiled
into the kernel. The stack now support NCQ (tagged queueing) for increased
performance on modern SATA drives. It also supports port multipliers.
ATA drives are accessed via 'ada' device nodes. ATAPI drives are
accessed via 'cd' device nodes. They can all be enumerated and manipulated
via camcontrol, just like SCSI drives. SCSI commands are not translated to
their ATA equivalents; ATA native commands are used throughout the entire
stack, including camcontrol. See the camcontrol manpage for further
details. Testing this code may require that you update your fstab, and
possibly modify your BIOS to enable AHCI functionality, if available.
This code is very experimental at the moment. The userland ABI/API has
changed, so applications will need to be recompiled. It may change
further in the near future. The 'ada' device name may also change as
more infrastructure is completed in this project. The goal is to
eventually put all CAM busses and devices until newbus, allowing for
interesting topology and management options.
Few functional changes will be seen with existing SCSI/SAS/FC drivers,
though the userland ABI has still changed. In the future, transports
specific modules for SAS and FC may appear in order to better support
the topologies and capabilities of these technologies.
The modularization of CAM and the addition of the ATA/SATA modules is
meant to break CAM out of the mold of being specific to SCSI, letting it
grow to be a framework for arbitrary transports and protocols. It also
allows drivers to be written to support discrete hardware without
jeopardizing the stability of non-related hardware. While only an AHCI
driver is provided now, a Silicon Image driver is also in the works.
Drivers for ICH1-4, ICH5-6, PIIX, classic IDE, and any other hardware
is possible and encouraged. Help with new transports is also encouraged.
Since the nfscl_getclose() function both decremented open counts and,
optionally, created a separate list of NFSv4 opens to be closed, it
was possible for the associated OpenOwner to be free'd before the Open
was closed. The problem was that the Open was taken off the OpenOwner
list before the Close RPC was done and OpenOwners can be free'd once the
list is empty. This patch separates out the case of doing the Close RPC
into a separate function called nfscl_doclose() and simplifies nfsrpc_doclose()
so that it closes a single open instead of a list of them. This avoids
removing the Open from the OpenOwner list before doing the Close RPC.
The control terminal revocation at the session leader exit does not
correctly checks for reclaimed vnode, possibly calling VOP_REVOKE for
such vnode. If the terminal is already revoked, or devfs mount was
forcibly unmounted, the revocation of doomed ctty vnode causes panic.
Reported and tested by: lstewart
Approved by: re (kensmith)
MFC after: 2 weeks
Extend the cn_flags field of the struct componentname to 64 bits to have
more space for the flags, that is too close to be exhausted. While changing
the KBI for name(9), use unsigned int for symlinks count.
Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.
Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.
The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.
Reviewed by: jeff
Tested (and tested, and tested ...) by: pho
Approved by: re (kensmith)
There is an optimization in chmod(1), that makes it not to call chmod(2)
if the new file mode is the same as it was before; however, this
optimization must be disabled for filesystems that support NFSv4 ACLs.
Chmod uses pathconf(2) to determine whether this is the case - however,
pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.
This change adds lpathconf(3) to make it possible to solve that problem
in a clean way.
Reviewed by: rwatson (earlier version)
Approved by: re (kib)
Fix regressions in return events of poll() on TTYs.
As pointed out, POLLHUP should be generated, even if it hasn't been
specified on input. It is also not allowed to return both POLLOUT and
POLLHUP at the same time.
Fix kernel panic, when ataahci driver is used on system with increased
MAXPHYS. Current ataahci driver memory allocation scheme includes only
64 items in DMA S/G table, and so not guarantied to support transactions
with more then 252K data.
Revert revisions 188839 and 188868. Use of the ioctl in geom_dev.c
is invalid because the ioctl happens without prior open. The ioctl
got introduced to provide backward compatibility for extended
partitions, but it ended up not being used because it didn't work
as expected. Since there are no consumers of the ioctl and the
implementation is broken, the best fix is to remove the code
entirely.
Increase HZ_VM from 10 to 100. While 10 hz saves cpu time
under VM environments, it's too slow for FreeBSD to work
properly. For example, ping at 10hz pings about every 600ms
instead of about every second.
Fix poll(2) and select(2) for named pipes to return "ready for read"
when all writers, observed by reader, exited. Use writer generation
counter for fifo, and store the snapshot of the fifo generation in the
f_seqcount field of struct file, that is otherwise unused for fifos.
Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is
equal to fifo' fi_wgen, and revert r89376.
Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them.
Note that the patch does not fix not returning POLLHUP for fifos.
When pmap_change_attr() changes the PAT setting on a kernel mapping, it has
to simultaneously change the PAT setting for the same pages within the
direct map region. This may require the demotion of a 2MB page mapping and
the allocation of a page table page. This revision gives the highest
possible priority (VM_ALLOC_INTERRUPT) to this page allocation, so that
pmap_change_attr() is less likely to fail. (In general, kernel page table
page allocations have the highest priority, so this is not creating a new
precedent.)
(Demotion of 1GB page mappings within the direct map already specifies
VM_ALLOC_INTERRUPT to vm_page_alloc(), so only pmap_demote_pde() must be
changed.)
After the per-CPU IDT changes, the IDT vector of an interrupt could change
when the interrupt was moved from one CPU to another. If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled. This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range. However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled. The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.
Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method. While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.
The new method of reading the mac address from the
RAR(0) register does not work on this old adapter,
provide a local routine that does it the older way.
- pkg_install is maintained by portmgr.
- BSD.x11{,-4}.dist aren't used anymore and BSD.local.dist now lives
in ports/Templates/. Most people apparently missed that move and still
commit to the src copy, so I'll have to remove it eventually but for
now, the MAINTAINERS line can go.
In the current code, rdlock_count is not correctly handled for some cases.
The most notable is that it is not bumped in rwlock_rdlock_common() when
the hard path (__thr_rwlock_rdlock()) returns successfully.
This can lead to deadlocks in libthr when rwlocks recursion in read mode
happens.
Fix the interested parts by correctly handling rdlock_count.
PR: threads/136345
Reported by: rink
Tested by: rink
Reviewed by: jeff
Approved by: re (kib)
MFC: 2 weeks
This addresses some issues with my earlier -R fix that
were pointed out by Brooks Davis and Alexey Dokuchaev:
* It now tries to lookup arguments as names first, then tries
to parse them as numbers. In particular, this makes the
behavior consistent with POSIX conventions when usernames
consist entirely of digits.
* It now uses strtoul() for the numeric parsing.
Finally, I've included an update to the test harness
to exercise the new numeric cases for -R.
PAE adds another level to the i386 page table. This level is a small
4-entry table that must be located within the first 4GB of RAM. This
requirement is met by defining an UMA zone with a custom back-end
allocator function. This revision makes two changes to this back-end
allocator function: (1) It replaces the use of contigmalloc() with the
use of kmem_alloc_contig(). This eliminates "double accounting", i.e.,
accounting by both the UMA zone and malloc tags. (I made the same
change for the same reason to the zones supporting jumbo frames a week
ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK,
M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it.
pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO. At the
moment, this is harmless only because the default behavior of
contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init()
doesn't really depend on the memory being zeroed.
The back-end allocator function in the Xen pmap is dead code. I am
changing it nonetheless because I don't want to leave any "bad examples"
in the source tree for someone to copy at a later date.
sam [Sun, 5 Jul 2009 18:17:37 +0000 (18:17 +0000)]
Add ieee80211_ageq; a facility for staging packets that require
long-term work before they can be serviced. Packets are tagged and
assigned an age (in seconds) at the point they are added to the
queue. If a packet is not retrieved before it's age expires it is
reclaimed. Tagging can take two forms: a reference to an ieee80211_node
(as happens in the tx path) or an opaque token in cases where there
is no reference or the node structure is not stable (i.e. it's going
to be destroyed).
o add ic_stageq to replace the per-node wds staging queue used for
dynamic wds
o add ieee80211_mac_hash for building ageq tokens; this computes a
32-bit hash from an 802.11 mac address (copied from the bridge)
o while here fix a stray ';' noticed in IEEE80211_PSQ_INIT
- Increase dynamic range of filter coefficients from 28bit to 30bit.
This cause dramatic effect in overall precision and conversion quality
by pushing down most aliasing artifacts around -180 dB.
- Guard against possible 64bit overflow during accumulation process by
slightly normalize and saturate sample and coefficient multiplication,
possible during extreme 32bit downsampling (eg. 380KHz -> 8KHz) with
custom preset that require more than ~7000 taps filter (which is
overkill).
- Add knobs through FEEDER_RATE_PRESETS to set dynamic range of filter
coefficients/accumulator and prefered polynomial interpolator:
COEFFICIENT_BIT:X
(where 1 <= X <= 30, default: 30)
ACCUMULATOR_BIT:X
(where 32 <= X <=64, default: 58)
sam [Sun, 5 Jul 2009 17:59:19 +0000 (17:59 +0000)]
Revamp 802.11 action frame handling:
o add a new facility for components to register send+recv handlers
o ieee80211_send_action and ieee80211_recv_action now use the registered
handlers to dispatch operations
o rev ieee80211_send_action api to enable passing arbitrary data
o rev ieee80211_recv_action api to pass the 802.11 frame header as it may
be difficult to locate
o update existing IEEE80211_ACTION_CAT_BA and IEEE80211_ACTION_CAT_HT handling
o update mwl for api rev
sam [Sun, 5 Jul 2009 17:45:48 +0000 (17:45 +0000)]
Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent
Reviewed by: bde, imp, marcel
Approved by: re (kensmith)
Add a new options (-s) that, when specified, skips the question about
adjusting the clock to UTC.
That avoids to write on /etc/wall_cmos_clock which is useful in some
cases (example: host user in a jail).
Sponsored by: Sandvine Incorporated
Initially submitted by: Matt Koivisto <mkoivisto at sandvine dot com>
Approved by: re (kib)
When forking a vm space that has wired map entries, do not forget to
charge the objects created by vm_fault_copy_entry. The object charge
was set, but reserve not incremented.
acpi_hp.c:
- sysctl dev.acpi_hp.0.verbose to toggle debug output
- A modification so this can deal with different array lengths
when reading the CMI BIOS - now it works ok on HP Compaq nx7300
as well.
- Change behaviour to query only max_instance-1 CMI BIOS instances,
because all HPs seen so far are broken in that respect
(or there is a fundamental misunderstanding on my side, possible
as well). This way a disturbing ACPI Error Field exceeds Buffer
message is avoided.
- New bit to set on dev.acpi_hp.0.cmi_detail (0x8) to
also query the highest guid instance of CMI bios
acpi_hp.4:
- Document dev.acpi_hp.0.verbose sysctl in man page
- Document new bit for dev.acpi_hp.0.cmi_detail
- Add a section to manpage about hardware that has been reported
to work ok
Submitted by: Michael Gmelin <freebsdusb at bindone.de>
Approved by: re (kib)
MFC after: 2 weeks
This fixes bsdcpio's -R option to accept numeric
user or group Ids as well as user or group names.
In particular, this fixes freesbie2, which uses
-R 0:0 to copy a bunch of files so that the result
will be owned by root.
Also fixes a related bug that mixed-up the uid
and gid specified by -R when in passthrough mode.
Thanks to Dominique Goncalves for reporting this
regression.
In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point
around the sequence that drop vnode lock and then busies the mount point.
Not having vlocked node or direct reference to the mp allows for the
forced unmount to proceed, making mp unmounted or reused.
Tested by: pho
Reviewed by: jeff
Approved by: re (kensmith)
MFC after: 2 weeks
Slightly increase amount of bandwidth of resampling filter for
feeder_rate_quality=3. This have the benefit of reducing aliasing
artifacts due to alias masking.
Spectrogram analysis:
o Old preset (100:36:0.90)
http://people.freebsd.org/~ariff/z_comparison/z_q3_old.png
o New preset (100:36:0.92):
http://people.freebsd.org/~ariff/z_comparison/z_q3_new.png
Clean up a number of aspects of token generation from audit arguments to
system calls:
- Centralize generation of argument tokens for VM addresses in a macro,
ADDR_TOKEN(), and properly encode 64-bit addresses in 64-bit arguments.
- Fix up argument numbers across a large number of syscalls so that they
match the numeric argument into the system call.
- Don't audit the address argument to ioctl(2) or ptrace(2), but do keep
generating tokens for mmap(2), minherit(2), since they relate to passing
object access across execve(2).
jeff [Wed, 1 Jul 2009 20:43:46 +0000 (20:43 +0000)]
- Use fd_lastfile + 1 as the upper bound on nd. This is more correct than
using the size of the descriptor array.
- A lock is not needed to fetch fd_lastfile. The results are stale the
instant it is dropped.
- Use a private mutex pool for select since the pool mutex is not used
as a leaf.
- Fetch the si_mtx pointer first before resorting to hashing to compute
the mutex address.
Fix a panic which (reportedly) can happen when unmounting a filesystem
with I/O requests in flight on kernels compiled with "options INVARIANTS".
Also, make it obvious it's not right to call g_valid_obj() (and macros
using it, e.g. G_VALID_CONSUMER()) without topology lock held.
DPCPU area was not properly mapped into kernel VA space, which caused page
fault on the first DPCPU access. This patch fixes the problem by mapping DPCPU
area into kernel VA space.
Submitted by: Michal Hajduk, Piotr Ziecik
Reviewed by: cognet, stas
Approved by: re (kib)
Obtained from: Semihalf
Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.
When auditing unmount(2), capture FSID arguments as regular text strings
rather than as paths, which would lead to them being treated as relative
pathnames and hence confusingly converted into absolute pathnames.
Capture flags to unmount(2) via an argument token.
Approved by: re (audit argument blanket)
MFC after: 3 days
When unmounting an NFS mount using sec=krb5[ip], the umount system
call could get hung sleeping on "gsssta" if the credentials for a user
that had been accessing the mount point have expired. This happened
because rpc_gss_destroy_context() would end up calling itself when the
"destroy context" RPC was attempted, trying to refresh the credentials.
This patch just checks for this case in rpc_gss_refresh() and returns
without attempting the refresh, which avoids the recursive call to
rpc_gss_destroy_context() and the subsequent hang.
Reviewed by: dfr
Approved by: re (Ken Smith), kib (mentor)
Make sure that cr_error is set to ESHUTDOWN when closing the connection.
This is normally done by a loop in clnt_dg_close(), but requests that aren't
in the pending queue at the time of closing, don't get set. This avoids a
panic in xdrmbuf_create() when it is called with a NULL cr_mrep if
cr_error doesn't get set to ESHUTDOWN while closing.
Reviewed by: dfr
Approved by: re (Ken Smith), kib (mentor)
With NFSv4 ACLs, it is possible that applying a mode to an ACL which
is identical to the mode computed from that ACL will modify the ACL.
For example, mode computed from the following ACL is 0600:
In chmod(1) utility, there is an optimisation, which makes it not
call chmod(2) if the mode of the file is the same as the new mode.
Disable that optimisation for files which may have NFSv4 ACLs.
- Fix the bug in write(2) called with incorrect parameters resulting in writes
always started from the start of the packet.
- Fix usage string (multiple addresses can be specified).
- Make the source more style(9) compliant.
- Improve error reporting (do not silently fail if something goes
wrong).
- Make functions static.
- Use warns level 6.
Approved by: re (kib)
Discussed with: Marc Balmer <marc@msys.ch>, brian, mbr