kib [Wed, 6 Jun 2012 16:30:16 +0000 (16:30 +0000)]
Improve handling of uiomove(9) errors for the NFS client.
Do not brelse() the buffer unconditionally with BIO_ERROR set if
uiomove() failed. The brelse() treats most buffers with BIO_ERROR as
B_INVAL, dropping their content. Instead, if the write request
covered the whole buffer, remember the cached state and brelse() with
BIO_ERROR set only if the buffer was not cached previously.
Update the buffer dirtyoff/dirtyend based on the progress recorded by
uiomove() in passed struct uio, even in the presence of
error. Otherwise, usermode could see changed data in the backed pages,
but later the buffer is destroyed without write-back.
If uiomove() failed for IO_UNIT request, try to truncate the vnode
back to the pre-write state, and rewind the progress in passed uio
accordingly, following the FFS behaviour.
glebius [Wed, 6 Jun 2012 09:36:52 +0000 (09:36 +0000)]
Merge revision 1.715 from OpenBSD:
date: 2010/12/24 20:12:56; author: henning; state: Exp; lines: +3 -3
in pf_src_connlimit, the indices to sk->addr were swapped.
tracked down and diff sent by Robert B Mills <rbmills at sdf.lonestar.org>
thanks, very good work! ok claudio
mav [Wed, 6 Jun 2012 06:52:51 +0000 (06:52 +0000)]
ATA/SATA controllers have no idea about protocol of the connected device
until transport will do some probe actions (at least soft reset).
Make ATA/SATA SIMs to not report bogus and confusing PROTO_ATA protocol.
Make ATA/SATA transport to fill that gap by reporting protocol to SIM with
XPT_SET_TRAN_SETTINGS and patching XPT_GET_TRAN_SETTINGS results if needed.
imp [Wed, 6 Jun 2012 06:19:52 +0000 (06:19 +0000)]
Enhance the Atmel SoC chip identification routines to account for more
SoC variants. Fold the AT91SAM9XE chips into the AT91SAM9260
handling, where appropriate. The following SoCs/SoC families are recognized:
at91cap9, at91rm9200, at91sam9260, at91sam9261, at91sam9263,
at91sam9g10, at91sam9g20, at91sam9g45, at91sam9n12, at91sam9rl,
at91sam9x5
and the following variations are also recognized:
at91rm9200_bga, at91rm9200_pqfp, at91sam9xe, at91sam9g45, at91sam9m10,
at91sam9g46, at91sam9m11, at91sam9g15, at91sam9g25, at91sam9g35,
at91sam9x25, at91sam9x35
This is only the identification routine: no additional Atmel devices
are supported at this time.
# With these changes, I'm able to boot to the point of identification
# on a few different Atmel SoCs that we don't yet support using the
# KB920X config file -- someday tht will be an ATMEL config file...
emax [Tue, 5 Jun 2012 21:35:47 +0000 (21:35 +0000)]
Add a very simple debug tool that would dump list of interfaces,
addresses on each interface, and, associated refcounter. I found
it handy to check for address refcounter leaks.
gnn [Tue, 5 Jun 2012 18:58:05 +0000 (18:58 +0000)]
Add DTrace's io.d, which handles tranlsations for file, buffer and
device info structures as well as the fds[] array. This is a raw
version of the file, unmodified, to be used as a baseline.
emax [Tue, 5 Jun 2012 18:48:02 +0000 (18:48 +0000)]
Before it gets lost in the noise.
Put a bandaid to prevent ixgbe(4) from completely locking up the system
under high load. Our platform has a few CPU cores and a single active
ixgbe(4) port with 4 queues. Under high enough traffic load, at about
7.5GBs and 700,000 packets/sec (outbound), the entire system would
deadlock. What we found was that each CPU was in an endless loop on a
different ix taskqueue thread. The OACTIVE flag had gotten set on each
queue, and the ixgbe_handle_queue() function was continuously rescheduling
itself via the taskqueue_enqueue. Since all CPUs were busy with their
taskqueue threads, the ixgbe_local_timer() function couldn't run to clear
the OACTIVE flag.
ache [Tue, 5 Jun 2012 16:16:33 +0000 (16:16 +0000)]
1) Although unpublished version of standard
http://austingroupbugs.net/view.php?id=385#c713
(Resolved state) recommend this way for the current standard (called
"earlier" in the text)
"However, earlier versions of this standard did not require this, and the
same example had to be written as:
// buf was obtained by malloc(buflen)
ret = write(fd, buf, buflen);
if (ret < 0) {
int save = errno;
free(buf);
errno = save;
return ret;
}
"
from feedback I have for previous commit it seems that many people prefer
to avoid mass code change needed for current standard compliance
and prefer to track unpublished standard instead, which requires now
that free() itself must save errno, not its usage code.
So, I back out "save errno across free()" part of previous commit,
and will fill PR for changing free() isntead.
adrian [Tue, 5 Jun 2012 06:03:55 +0000 (06:03 +0000)]
Mostly revert previous commit(s). After doing a bunch of local testing,
it turns out that it negatively affects performance. I'm stil investigating
exactly why deferring the IO causes such negative TCP performance but
doesn't affect UDP preformance.
Leave the ath_tx_kick() change in there however; it's going to be useful
to have that there for if_transmit() work.
adrian [Tue, 5 Jun 2012 03:14:49 +0000 (03:14 +0000)]
Create a function - ath_tx_kick() - which is called where ath_start() is
called to "kick" along TX.
For now, schedule a taskqueue call.
Later on I may go back to the direct call of ath_rx_tasklet() - but for
now, this will do.
I've tested UDP and TCP TX. UDP TX still achieves 240MBit, but TCP
TX gets stuck at around 100MBit or so, instead of the 150MBit it should
be at. I'll re-test with no ACPI/power/sleep states enabled at startup
and see what effect it has.
This is in preparation for supporting an if_transmit() path, which will
turn ath_tx_kick() into a NUL operation (as there won't be an ifnet
queue to service.)
Tested:
* AR9280 STA
TODO:
* test on AR5416, AR9160, AR928x STA/AP modes
dougb [Mon, 4 Jun 2012 22:11:20 +0000 (22:11 +0000)]
Upgrade to 9.8.3-P1, the latest from ISC. This version contains
a critical bugfix:
Processing of DNS resource records where the rdata field is zero length
may cause various issues for the servers handling them.
Processing of these records may lead to unexpected outcomes. Recursive
servers may crash or disclose some portion of memory to the client.
Secondary servers may crash on restart after transferring a zone
containing these records. Master servers may corrupt zone data if the
zone option "auto-dnssec" is set to "maintain". Other unexpected
problems that are not listed here may also be encountered.
All BIND users are strongly encouraged to upgrade.
adrian [Mon, 4 Jun 2012 22:01:12 +0000 (22:01 +0000)]
Migrate the TX path to a taskqueue for now, until a better way of
implementing parallel TX and TX/RX completion can be done without
simply abusing long-held locks.
Right now, multiple concurrent ath_start() entries can result in
frames being dequeued out of order. Well, they're dequeued in order
fine, but if there's any preemption or race between CPUs between:
* removing the frame from the ifnet, and
* calling and runningath_tx_start(), until the frame is placed on a
software or hardware TXQ
Then although dequeueing the frame is in-order, queueing it to the hardware
may be out of order.
This is solved in a lot of other drivers by just holding a TX lock over
a rather long period of time. This lets them continue to direct dispatch
without races between dequeue and hardware queue.
Note to observers: if_transmit() doesn't necessarily solve this.
It removes the ifnet from the main path, but the same issue exists if
there's some intermediary queue (eg a bufring, which as an aside also
may pull in ifnet when you're using ALTQ.)
So, until I can sit down and code up a much better way of doing parallel
TX, I'm going to leave the TX path using a deferred taskqueue task.
What I will likely head towards is doing a direct dispatch to hardware
or software via if_transmit(), but it'll require some driver changes to
allow queues to be made without using the really large ath_buf / ath_desc
entries.
TODO:
* Look at how feasible it'll be to just do direct dispatch to
ath_tx_start() from if_transmit(), avoiding doing _any_ intermediary
serialisation into a global queue. This may break ALTQ for example,
so I have to be delicate.
* It's quite likely that I should break up ath_tx_start() so it
deposits frames onto the software queues first, and then only fill
in the 802.11 fields when it's being queued to the hardware.
That will make the if_transmit() -> software queue path very
quick and lightweight.
* This has some very bad behaviour when using ACPI and Cx states.
I'll do some subsequent analysis using KTR and schedgraph and file
a follow-up PR or two.
ache [Mon, 4 Jun 2012 21:34:49 +0000 (21:34 +0000)]
1) IEEE Std 1003.1-2008, "errno" section, is explicit that
"The setting of errno after a successful call to a function is
unspecified unless the description of that function specifies that
errno shall not be modified."
However, free() in IEEE Std 1003.1-2008 does not mention its interaction
with errno, so MAY modify it after successful call
(it depends on particular free() implementation, OS-specific, etc.).
So, save errno across free() calls to make code portable and
POSIX-conformant.
marius [Mon, 4 Jun 2012 20:56:40 +0000 (20:56 +0000)]
The loaddev environment variable is not modifiable once set, so it is not
update for ZFS. It seems that this does not really affect anything except
the help command. Nevertheless, rearrange things so loaddev is set only
once in all cases in order to get it right.
Pointed out by: avg
marius [Mon, 4 Jun 2012 20:45:33 +0000 (20:45 +0000)]
The workaround added in r151650 for handling firmwares that don't allow
a single device to be opened multiple times concurrently unfortunately
isn't sufficient with ZFS. This is due to the fact, that ZFS may open
different partitions of a single device simultaneously. So the best we
can do in this case is to cache the lastly used device path and close
and open devices in ofwd_strategy() as needed.
PR: 165025
Submitted by: Gavin Mu
MFC after: 1 week
dim [Mon, 4 Jun 2012 20:36:11 +0000 (20:36 +0000)]
Fix build of aicasm when CC=clang. This was due to a side-effect of the
EARLY_BUILD macro: the -Qunused-arguments flag isn't passed anymore when
building this particular program. However, with clang 3.1 and -Werror,
such unused argument warnings are flagged as errors, causing buildkernel
to fail at this stage, due to the -nostdinc flag passed during linking.
Since the -nostdinc flag isn't actually needed, just remove it.
delphij [Mon, 4 Jun 2012 18:02:09 +0000 (18:02 +0000)]
Replace the use of wall clock time with monotonically increasing
clock. In general, gettimeofday() is not appropriate interface
when accounting for elasped time because it can go backward, in
which case the policy code could errornously consider the limit
as exceeded.
MFC after: 1 week
Reported by: Mahesh Arumugam
Submitted by: Dorr H. Clark via gnn
Sponsored by: Citrix / NetScaler
zml [Mon, 4 Jun 2012 16:04:01 +0000 (16:04 +0000)]
Fix DTrace TSC skew calculation:
The skew calculation here is exactly backwards. We were able to repro
it on a multi-package ESX server running a FreeBSD VM, where the TSCs
can be pretty evil.
MFC after: 1 week
Submitted by: Jeff Ford <jeffrey.ford2@isilon.com>
Reviewed by: avg, gnn
glebius [Mon, 4 Jun 2012 12:49:21 +0000 (12:49 +0000)]
Optimise kern_sendfile(): skip cycling through the entire mbuf chain in
m_cat(), storing pointer to last mbuf in chain in local variable and
attaching new mbuf to the end of chain.
Submitter reports that CPU load dropped for > 10% on a web server
serving large files with this optimisation.
imp [Mon, 4 Jun 2012 04:24:59 +0000 (04:24 +0000)]
Eliminate the now-unused AT91C_MASTER_CLOCK option and change the one
place in the source it was used to the more correct AT91C_MAIN_CLOCK.
Sort AT91C_MAIN_CLOCK into a better location in the options.arm file.
alc [Mon, 4 Jun 2012 03:51:08 +0000 (03:51 +0000)]
Various small changes to PV entry management:
Constify pc_freemask[].
pmap_pv_reclaim()
Eliminate "freemask" because it was a pessimization. Add a comment about
the resident count adjustment.
free_pv_entry() [i386 only]
Merge an optimization from amd64 (r233954).
get_pv_entry()
Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
(The right strategy needs more thought. Moreover, there were unintended
differences between the amd64 and i386 implementation.)
dim [Sun, 3 Jun 2012 20:35:41 +0000 (20:35 +0000)]
During buildworld and buildkernel, define EARLY_BUILD in the earlier
stages (build-tools, cross-tools, etc) of the build, so we can detect in
bsd.*.mk whether to pass compiler-specific flags to ${CC}.
In particular, this commit will allow using WITH_CLANG_IS_CC when the
base compiler is still gcc, and when ${CC}, ${CXX} and ${CPP} are left
at their defaults. The early stages will then be built using gcc, and
no clang-specific flags will be passed to it. The later stages will be
built as usual.
The EARLY_BUILD define can also serve other uses, such as building the
world stage C++ executables with libc++ instead of libstdc++: during the
early build stages, we cannot assume libc++ is already available, so we
must still build with libstdc++ at that time.
imp [Sun, 3 Jun 2012 18:34:32 +0000 (18:34 +0000)]
Minor rearrangement of the locore <-> initarm interface. Pass in a
structure with the first 4 registers to allow a wider range of boot
loaders to work. Future commits will make use of this to centralize
support for the different loaders.
avg [Sun, 3 Jun 2012 08:30:00 +0000 (08:30 +0000)]
cpucontrol: use CPUCTL_UPDATE ioctl on correct file descriptor
I guess that means that microcode update has never worked for AMD CPUs.
Please also note that only older AMD CPUs and micrcode file format are
supported anyway (pre 10h family).
emax [Sun, 3 Jun 2012 07:36:59 +0000 (07:36 +0000)]
Plug reference leak.
Interface routes are refcounted as packets move through the stack,
and there's garbage collection tied to it so that route changes can
safely propagate while traffic is flowing. In our setup, we weren't
changing or deleting any routes, but the refcounting logic in
ip6_input() was wrong and caused a reference leak on every inbound
V6 packet. This eventually caused a 32bit overflow, and the resulting
0 value caused the garbage collection to run on the active route.
That then snowballed into the panic.
marius [Sun, 3 Jun 2012 01:07:55 +0000 (01:07 +0000)]
- Now that the DataFlash related drivers work properly (at91_spi(4) since
r236495 and at45d(4) since r236496), enable them by default.
- Sort BOOTP options.
marius [Sun, 3 Jun 2012 01:00:55 +0000 (01:00 +0000)]
- Loop up to 3 seconds when waiting for a device to get ready. [1]
- Make the device description match the driver name.
- Identify the chip variant based on the JEDEC and use that information
to use the proper values for page count, offset and size instead of
hardcoding a AT45DB642x with 2^N byte page support disabled.
- Take advantage of bioq_takefirst().
- Given that CONTINUOUS_ARRAY_READ_HF (0x0b) command isn't even mentioned
in Atmel's DataFlash Application Note, as suggested by the previous
comment may not work on all all devices and actually doesn't properly
on at least AT45DB321D (JEDEC 0x1f2701), rewrite at45d_task() to use
CONTINUOUS_ARRAY_READ (0xe8) for reading instead. This rewrite is laid
out in a way allowing to easily add support for BIO_DELETE later on.
- Add support for reads and writes not starting on a page boundary.
- Verify the flash content after writing.
- Let at45d_task() gracefully handle errors on SPI transfers and the
device not becoming ready afterwards again. [1]
- Use DEVMETHOD_END. [1]
- Use NULL instead of 0 for pointers. [1]
marius [Sun, 3 Jun 2012 00:54:10 +0000 (00:54 +0000)]
- Prepend the device description with "AT91" to reflect its nature. [1]
- Move DMA tag and map creature to at91_spi_activate() where the other
resource allocation also lives. [1]
- Flesh out at91_spi_deactivate(). [1]
- Work around the "Software Reset must be Written Twice" erratum.
- For now, run the bus at the slowest speed possible in order to work
around data corruption on transit even seen with 9 MHz on ETHERNUT5
(15 MHz maximum) and AT45DB321D (20 MHz maximum). This also serves as
a poor man's work-around for the "NPCSx rises if no data data is to be
transmitted" erratum of RM9200. Being able to use the appropriate bus
speed would require:
1) Adding a proper work-around for the RM9200 bug consisting of taking
the chip select control away from the SPI peripheral and managing it
directly as a GPIO line.
2) Taking the maximum frequencies supported by the actual board and the
slave devices into account and basing the whole thing on the master
clock instead of hardcoding a divisor as previously done.
3) Fixing the above mentioned data corruption.
- KASSERT that TX/RX command and data sizes match on transfers.
- Introduce a mutex ensuring that only one child device is running a SPI
transfer at a time. [1]
- Add preliminary, #ifdef'ed out support for setting the chip select. [1]
- Use the RX instead of the TX commando size when setting up the RX side
of a transfer.
- For controllers having SPI_SR_TXEMPTY, i.e. !RM9200, also wait for the
completion of the TX part of transfers before stopping the whole thing
again.
- Use DEVMETHOD_END. [1]
- Use NULL instead of 0 for pointers. [1, partially]