MFC r203261 (by marcel):
Export the UUID of the partition in the XML. The partition UUID is used
by EFI's device path to identify a partition. In order for FreeBSD to
add EFI boot options, proper device paths need to be constructed.
kib [Sat, 2 Oct 2010 17:41:47 +0000 (17:41 +0000)]
MFC r212824:
Adopt the deferring of object deallocation for the deleted map entries
on map unlock to the lock downgrade and later read unlock operation.
MFC r212868 (by alc) [1]:
Make refinements to r212824. Redo the implementation of
vm_map_unlock_and_wait().
MFC r212732:
Fix panic, when due to some kind of congestion on FIS-based switching
port multiplier some command triggers false positive timeout, but then
completes normally.
In the case of non-sequential mappings, ofw_mapmem() could ask Open
Firmware to map a memory region with negative length, causing crashes
and Undefined Behavior. Add the appropriate check to make the behavior
defined.
Fix a problem where device detection would work unreliably on Serverworks
K2 SATA controllers. The chip's status register must be read first, and
as a long, for other registers to be correctly updated after a command, and
this includes the command sequence in device detection as well as the
previously handled case after interrupts. While here, clean up some
previous hacks related to this controller.
MFC r211913:
Do not allocate multicast array memory in multicast filter
configuration function. For failed memory allocations, em(4)/lem(4)
called panic(9) which is not acceptable on production box.
igb(4)/ixgb(4)/ix(4) allocated the required memory in stack which
consumed 768 bytes of stack memory which looks too big.
To address these issues, allocate multicast array memory in device
attach time and make multicast configuration success under any
conditions. This change also removes the excessive use of memory in
stack.
MFC r211909:
If em(4) failed to allocate RX buffers, do not call panic(9).
Just showing some buffer allocation error is more appropriate
action for drivers. This should fix occasional panic reported on
em(4) when driver encountered resource shortage.
MFC r212755:
Fix incorrect RX BD producer updates. The producer index was
already updated after allocating mbuf so driver had to use the last
index instead of using next producer index. This should fix driver
hang which may happen under high network load.
Reported by: Igor Sysoev <is <> rambler-co dot ru>, Vlad Galu <dudu <> dudu dot ro>
Tested by: Igor Sysoev <is <> rambler-co dot ru>, Vlad Galu <dudu <> dudu dot ro>
MFC 212902:
Tweak the stats exported by the e1000 drivers:
- Add a single sysctl procedure to all three drivers to read an arbitrary
register (the register is passed as arg2). Use it to replace existing
routines in igb(4) that used a separate routine for each register, and
to add support for missing stats in em(4) and lem(4).
- Move the 'rx_overruns' and 'watchdog_timeouts' stats out of the MAC stats
section as they are driver stats, not MAC counters.
- Simplify the code that creates per-queue stats in igb(4) to use a single
loop and remove duplicated code.
- Properly read all 64 bits of the 'good octets received/transmitted' in
em(4) and lem(4).
- Actually read the interrupt count registers in em(4), and drop the
'host to card' sysctl stats from em(4) as they are not implemented in
any of the hardware this driver supports.
- Restore several stats to em(4) that were lost in the earlier stats
conversion including per-queue stats.
- Export several MAC stats in em(4) that were exported in igb(4) but not
in em(4).
- Export stats in lem(4) using individual sysctls as in em(4) and igb(4).
MFC: r212834
Fix nfsrv_freeallnfslocks() in the experimental NFSv4 server so that
it frees local locks correctly upon close. In order for
nfsrv_localunlock() to work correctly, the lock can no longer be in
the lockowner's stateid list. As such, nfsrv_freenfslock() has to
be called before nfsrv_localunlock(), to get rid of the lock structure
on the lockowner's stateid list. This only affected operation when
local locks (vfs.newnfs.enable_locallocks=1) are enabled, which is
not the default at this time.
MFC: r212833
Fix the experimental NFSv4 server so that it performs local VOP_ADVLOCK()
unlock operations correctly. It was passing in F_SETLK instead of
F_UNLCK as the operation for the unlock case. This only affected
operation when local locking (vfs.newnfs.enable_locallocks=1) was enabled.
MFC: r212443
This patch applies one of the two fixes suggested by
zack.kirsch at isilon.com for a race between nfsrv_freeopen()
and nfsrv_getlockfile() in the experimental NFS server that
he found during testing. Although nfsrv_freeopen() holds a
sleep lock on the lock file structure when called with
cansleep != 0, nfsrv_getlockfile() could still search the
list, once it acquired the NFSLOCKSTATE() mutex. I believe
that acquiring the mutex in nfsrv_freeopen() fixes the race.
MFC: r212439
Fix the NFSVNO_CMPFH() macro in the experimental NFS server so
that it works correctly for ZFS file handles. It is possible to
have two ZFS file handles that differ only in the bytes in the
fid_reserved field of the generic "struct fid" and comparing the
bytes in fid_data didn't catch this case. This patch changes the
macro to compare all bytes of "struct fid".
MFC: r212362
Fix the experimental NFS client so that it doesn't panic when
NFSv2,3 byte range locking is attempted. A fix that allows the
nlm_advlock() to work with both clients is in progress, but
may take a while. As such, I am doing this commit so that
the kernel doesn't panic in the meantime.
Don't attempt to write label with GEOM_BSD based method if the class is
not available. This improves error reporting when bsdlabel(8) is unable
to open a device for writing. If GEOM_BSD was unavailable, only a rather
obscure error message "Class not found" was printed.
In setusercontext(), do not apply user settings unless running as the
user in question (usually but not necessarily because we were called
with LOGIN_SETUSER). This plugs a hole where users could raise their
resource limits and expand their CPU mask.
MFC r211765:
Remove unnecessary controller reinitialization.
CAM filter handling was rewritten long time ago so it should not
require controller reinitialization.
MFC r211764:
Add PNP id for Compex RL2000.
I'm not sure whether adding this logical id is correct or not
because Compex RL2000 is in the list of supported hardware list.
I guess the Compex RL2000 could be PCI variant while the controller
in question is ISA controller. It seems PNP compat id didn't match
or it had multiple compat ids so isa_pnp_probe() seemed to return
ENOENT.
MFC r211717:
Implement basic WOL support. Note, not all xl(4) controllers
support WOL. Some controllers require additional 3-wire auxiliary
remote wakeup connector to draw power. More recent xl(4)
controllers may not need the wakeup connector though.
MFC r211716:
Move xl_reset() to xl_init_locked(). This will make driver
initialize controller from a known good state. Previously driver
used to issue controller reset while TX/RX DMA are in progress.
I guess resetting controller in active TX/RX DMA cycle is to ensure
stopping I/Os in xl_shutdown(). I remember some buggy controllers
didn't respond with stop command if controller is under high
network load at the time of shutdown so resetting controller was
the only safe way to stop the I/Os. However, from my experiments,
controller always responded with stop command under high network
load so I think it's okay to remove the xl_reset() in
device_shutdown handler.
Resetting controller also will clear configured RX filter which
in turn will make WOL support hard because driver have to reprogram
RX filter in WOL handler as well as setting station address.
MFC r211596:
It seems all Broadcom controllers have a bug that can generate UDP
datagrams with checksum value 0 when TX UDP checksum offloading is
enabled. Generating UDP checksum value 0 is RFC 768 violation.
Even though the probability of generating such UDP datagrams is
low, I don't want to see FreeBSD boxes to inject such datagrams
into network so disable UDP checksum offloading by default. Users
still override this behavior by setting a sysctl variable or loader
tunable, dev.bge.%d.forced_udpcsum.
I have no idea why this issue was not reported so far given that
bge(4) is one of the most commonly used controller on high-end
server class systems. Thanks to andre@ who passed the PR to me.
Attempt to autodetect the cype of chipset, rather than storing this
within the device table. This code uses the same algorithm as used in
the Linux, NetBSD and DragonflyBSD driver.
While investigating this, it became apparent that the Linux driver
always initialises the device, and not just in the PL2303HX case.
Change uplcom(4) to do the same.
This change allows us to synchronize our device ID list with Linux and
NetBSD, without requiring knowledge of the chipset in use.
mdoc: move remaining sections into consistent order
This pertains mostly to FILES, HISTORY, EXIT STATUS and AUTHORS sections.
Found by: mdocml lint run
Reviewed by: ru
r210368:
Actually, only the fullsync mode is implemented, not memsync mode.
Correct manual page.
r210702:
Spelling fixes.
r210869:
Add an argument to the proto_register() function which allows protocol to
declare it is the default and be placed at the end of the queue so it is
checked last.
r210870:
Now that TCP will be checked last we don't need any knowledge about other
protocols.
r210872:
Mark two more places that we won't reach.
r210873:
Keep $FreeBSD$ in __FBSDID() only for C files.
r210875:
Problem with assertion is that it logs on stderr. Add two macros:
PJDLOG_ASSERT() and PJDLOG_VERIFY() that will check the given condition
and log the problem where appropriate. The difference between those
two is that PJDLOG_VERIFY() always work and PJDLOG_ASSERT() can be
turned off by defining NDEBUG.
r210876:
Assert that various buffers we are large enough.
r210879:
- Use pjdlog_exitx() to log errors and exit instead of errx().
- Use 'unable to' (instead of 'cannot') consistently.
r210880:
Reset signal handlers after fork().
r210881:
Allow to use 'none' keywork as remote address in case second cluster node
is not setup yet.
r210882:
Make control_set_role() more public. We will need it soon.
r210883:
Prepare configuration parsing code to be called multiple times:
- Don't exit on errors if not requested.
- Don't keep configuration in global variable, but allocate memory for
configuration.
- Call yyrestart() before yyparse() so that on error in configuration file
we will start from the begining next time and not from the place we left of.
r210886:
Implement configuration reload on SIGHUP. This includes:
- Load added resources.
- Stop and forget removed resources.
- Update modified resources in least intrusive way, ie. don't touch
/dev/hast/<name> unless path to local component or provider name were
modified.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r210892:
Document 'none' value for remote.
Reviewed by: dougb
r211397:
Fix typos, spelling, formatting and mdoc mistakes found by Nobuyuki while
translating these manual pages. Minor corrections by me.
The 'size' variable is there to limit how many bytes we want to copy from
'addr'. It is very likely that size of 'addr' is larger than 'size', so checking
strlcpy() return value is bogus.
r211452:
For some setups sending data in 128kB chunks makes communication very slow. No
idea why. 32kB on the other hand seems to work properly everywhere.
Reported by: Thomas Steen Rasmussen <thomas@gibfest.dk>
r211875:
Make comment more readable.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211876:
Add mtx_owned() implementation.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211877:
Add QUEUE_INSERT() and QUEUE_TAKE() macros that simplify the code a bit.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211878:
We have sync_start() function to start synchronization, introduce sync_stop()
function to stop it.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211879:
Log that synchronization was interrupted in a proper place.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211880:
Don't increase number synchronized bytes in case of an error.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211881:
- Remove redundant and incorrect 'old' word from debug message.
- Log disconnects as warnings.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211882:
Implement keepalive mechanism inside HAST protocol so we can detect secondary
node failures quickly for HAST resources that are rarely modified.
Remove XXX from a comment now that the guard thread never sleeps infinitely.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211883:
Reduce indent where possible.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211884:
When logging to stdout/stderr don't close those descriptors after fork().
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211885:
- Run hooks in background - don't block waiting for them to finish.
- Keep all hooks we're running in a global list, so we can report when
they finish and also report when they are running for too long.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211886:
Allow to execute specified program on various HAST events.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211887:
Document new 'exec' parameter.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211895:
Add hooks execution.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211896:
Check if no signals were delivered just before going to sleep.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211897:
Correct when we log interrupted synchronization.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211898:
When logging to stdout/stderr, flush after each log.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211899:
When SIGTERM or SIGINT is received, terminate worker processes.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211975:
Implement mtx_destroy() and rw_destroy().
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211976:
- Add hook_fini() which should be called after fork() from the main hastd
process, once it start to use hooks.
- Add hook_check_one() in case the caller expects different child processes
and once it can recognize it, it will pass pid and status to hook_check_one().
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211977:
Allow to run hooks from the main hastd process.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211978:
- Call hook on role change.
- Document new event.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211979:
Disconnect after logging errors.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211981:
- Move functionality responsible for checking one connection to separate
function to make code more readable.
- Be sure not to reconnect too often in case of signal delivery, etc.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211982:
Use sigtimedwait(2) for signals handling in primary process.
This fixes various races and eliminates use of pthread* API in signal handler.
Pointed out by: kib
With help from: jilles
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211983:
Execute hook when split-brain is detected.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211984:
Execute hook when connection between the nodes is established or lost.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212033:
Constify arguments we can constify.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212034:
Use pjdlog_exit() before fork().
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212036:
When someone gives NULL as data, assume this is because he want to declare
connection side only.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212037:
We only want to know if descriptors are ready for reading.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212038:
Because it is very hard to make fork(2) from threaded process safe (we are
limited to async-signal safe functions in the child process), move all hooks
execution to the main (non-threaded) process.
Do it by maintaining connection (socketpair) between child and parent
and sending events from the child to parent, so it can execute the hook.
This is step in right direction for others reasons too. For example there is
one less problem to drop privs in worker processes.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212046:
Mask only those signals that we want to handle.
Suggested by: jilles
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212049:
Forgot to add event.c and event.h in r212038.
Pointed out by: pluknet <pluknet@gmail.com>
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
Add __dead2 to functions that we know they are going to exit.
r213003:
Sort includes.
r213004:
If we are unable to receive control message is most likely because the main
process died. Instead of entering infinite loop, terminate.
r213006:
Fix descriptor leaks: when child exits, we have to close control and event
socket pairs. We did that only in one case out of three.
r213007:
Fix possible deadlock where worker process sends an event to the main process
while the main process sends control message to the worker process, but worker
process hasn't started control thread yet, because it waits for reply from the
main process.
The fix is to start the control thread before sending any events.
Reported and fix suggested by: Mikolaj Golub <to.my.trociny@gmail.com>
r213008:
Assert that descriptor numbers are sane.
r213009:
Switch to sigprocmask(2) API also in the main process and secondary process.
This way the primary process inherits signal mask from the main process,
which fixes a race where signal is delivered to the primary process before
configuring signal mask.
Rework r210248. Although it fixed most of problems, it did not fix one
particular edge case where X-axis resolution is not multiple of font width.
Now we just advance enough scan lines, then deduct a partial scan line.
It is more intuitive than the previous code. Apply the same wisdom to EGA
and VGA planar renderers for consistency.
marius [Wed, 22 Sep 2010 21:54:30 +0000 (21:54 +0000)]
MFC: r207608, r207614, r212651, r212665
Go ahead and merge the work edwin@ on tftpd into the tree. It is a
lot better than what's in the tree now. Edwin tested it at a prior
employer, but can't test it today. I've found that it works a lot
better with the various uboot versions that I've used in my embedded
work. Here's the pkg-descr from the port that describes the changes:
It all started when we got some new routers, which told me the
following when trying to upload configuration or download images
from it: The TFTP server doesn't support the blocksize option.
My curiousity was triggered, it took me some reading of RFCs and
other documentation to find out what was possible and what could
be done. Was plain TFTP very simple in its handshake, TFTP with
options was kind of messy because of its backwards capability: The
first packet returned could either be an acknowledgement of options,
or the first data packet.
Going through the source code of src/libexec/tftpd and going through
the code of src/usr.bin/tftp showed that there was a lot of duplicate
code, and the addition of options would only increase the amount
of duplicate code. After all, both the client and the server can
act as a sender and receiver.
At the end, it ended up with a nearly complete rewrite of the tftp
client and server. It has been tested against the following TFTP
clients and servers:
- Itself (yay!)
- The standard FreeBSD tftp client and server
- The Fedora Core 6 tftp client and server
- Cisco router tftp client
- Extreme Networks tftp client
It supports the following RFCs:
RFC1350 - THE TFTP PROTOCOL (REVISION 2)
RFC2347 - TFTP Option Extension
RFC2348 - TFTP Blocksize Option
RFC2349 - TFTP Timeout Interval and Transfer Size Options
RFC3617 - Uniform Resource Identifier (URI) Scheme and Applicability
Statement for the Trivial File Transfer Protocol (TFTP)
It supports the following unofficial TFTP Options as described at
http://www.compuphase.com/tftp.htm:
blksize2 - Block size restricted to powers of 2, excluding protocol headers
rollover - Block counter roll-over (roll back to zero or to one)
From the tftp program point of view the following things are changed:
- New commands: "blocksize", "blocksize2", "rollover" and "options"
- Development features: "debug" and "packetdrop"
If you try this tftp/tftpd implementation, please let me know if
it works (or doesn't work) and against which implementaion so I can
get a list of confirmed working systems.
marius [Wed, 22 Sep 2010 21:54:13 +0000 (21:54 +0000)]
MFC: r207607, r207621, r209112, r209550, r209551
Go ahead and merge the work edwin@ on tftpd into the tree. It is a
lot better than what's in the tree now. Edwin tested it at a prior
employer, but can't test it today. I've found that it works a lot
better with the various uboot versions that I've used in my embedded
work. Here's the pkg-descr from the port that describes the changes:
It all started when we got some new routers, which told me the
following when trying to upload configuration or download images
from it: The TFTP server doesn't support the blocksize option.
My curiousity was triggered, it took me some reading of RFCs and
other documentation to find out what was possible and what could
be done. Was plain TFTP very simple in its handshake, TFTP with
options was kind of messy because of its backwards capability: The
first packet returned could either be an acknowledgement of options,
or the first data packet.
Going through the source code of src/libexec/tftpd and going through
the code of src/usr.bin/tftp showed that there was a lot of duplicate
code, and the addition of options would only increase the amount
of duplicate code. After all, both the client and the server can
act as a sender and receiver.
At the end, it ended up with a nearly complete rewrite of the tftp
client and server. It has been tested against the following TFTP
clients and servers:
- Itself (yay!)
- The standard FreeBSD tftp client and server
- The Fedora Core 6 tftp client and server
- Cisco router tftp client
- Extreme Networks tftp client
It supports the following RFCs:
RFC1350 - THE TFTP PROTOCOL (REVISION 2)
RFC2347 - TFTP Option Extension
RFC2348 - TFTP Blocksize Option
RFC2349 - TFTP Timeout Interval and Transfer Size Options
RFC3617 - Uniform Resource Identifier (URI) Scheme and Applicability
Statement for the Trivial File Transfer Protocol (TFTP)
It supports the following unofficial TFTP Options as described at
http://www.compuphase.com/tftp.htm:
blksize2 - Block size restricted to powers of 2, excluding protocol headers
rollover - Block counter roll-over (roll back to zero or to one)
From the tftp program point of view the following things are changed:
- New commands: "blocksize", "blocksize2", "rollover" and "options"
- Development features: "debug" and "packetdrop"
If you try this tftp/tftpd implementation, please let me know if
it works (or doesn't work) and against which implementaion so I can
get a list of confirmed working systems.
marius [Wed, 22 Sep 2010 20:17:33 +0000 (20:17 +0000)]
MFC: r212729
Merge from powerpc:
- Change putc_func_t to use a char instead of an int for the character.
- Make functions and variables not used outside of this source file static.
- Remove unused prototypes and variables.
- The OFW read and seek methods take 3 and not 4 input arguments.
marius [Wed, 22 Sep 2010 20:15:34 +0000 (20:15 +0000)]
MFC: r212725
Merge r207585 (MFC'ed to stable/8 in r208086) from cas(4):
- Don't probe for PHYs if we already know to use a SERDES. Unlike as with
cas(4) this only serves to speed up the the device attach though and can
only be determined via the OFW device tree but not from the VPD.
- Don't touch the MIF when using a SERDES.
- Add some missing bus space barriers, mainly in the PCS code path.
marius [Wed, 22 Sep 2010 20:03:59 +0000 (20:03 +0000)]
MFC: rr212709, r212730
Add a VIS-based block copy function for SPARC64 V and later, which
additionally takes advantage of the prefetch cache of these CPUs.
Unlike the uncommitted US-III version, which provide no measurable
speedup or even resulted in a slight slowdown on certain CPUs models
compared to using the US-I version with these, the SPARC64 version
actually results in a slight improvement.
marius [Wed, 22 Sep 2010 19:59:11 +0000 (19:59 +0000)]
MFC: r212676
Sync with other platforms:
- make dflt_lock() always panic,
- add kludge to use contigmalloc() when the alignment is larger than the size
and print a diagnostic when we didn't satisfy the alignment.
marius [Wed, 22 Sep 2010 19:55:37 +0000 (19:55 +0000)]
MFC: r212663
- Update the comment in swi_vm() regarding busdma bounce buffers; it's
unlikely that support for these ever will be implemented on sparc64 as
the IOMMUs are able to translate to up to the maximum physical address
supported by the respective machine, bypassing the IOMMU is affected
by hardware errata and being able to support DMA engines which cannot
do at least 32-bit DMA does not justify the costs.
- The page zeroing in uma_small_alloc() may use the VIS-based block zero
function so take advantage of it.
MFC r197804 (rwatson):
Add basename_r(3) to complement basename(3). basename_r(3) which accepts
a caller-allocated buffer of at least MAXPATHLEN, rather than using a
global buffer.
Note about semantics: while this interface is not POSIXy, there's
another major platform that uses it (Android) and the semantics between
the two platforms are pretty much the same.
GCC defines built-ins for atomic instructions found on i486 and higher.
Because FreeBSD no longer supports the 80386 cpu all code targeting
FreeBSD/i386 necessarily runs on i486 or higher so the compiler
built-ins can be used by default inside libstdc++ and in C++ headers.
This allows newly compiled C++ code to inline some atomic operations.
Old binaries continue to use libstdc++ functions.
ed [Tue, 21 Sep 2010 07:01:00 +0000 (07:01 +0000)]
MFC r211598:
Add support for whiteouts on tmpfs.
Right now unionfs only allows filesystems to be mounted on top of
another if it supports whiteouts. Even though I have sent a patch to
daichi@ to let unionfs work without it, we'd better also add support for
whiteouts to tmpfs.
This patch implements .vop_whiteout and makes necessary changes to
lookup() and readdir() to take them into account. We must also make sure
that when adding or removing a file, we honour the componentname's
DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames.
Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
The barrier semantics of bioq_insert_tail() were broken in two ways:
o In bioq_disksort(), an added bio could be inserted at the head of
the queue, even when a barrier was present, if the sort key for
the new entry was less than that of the last queued barrier bio.
o The last_offset used to generate the sort key for newly queued bios
did not stay at the position of the barrier until either the
barrier was de-queued, or a new barrier (which updates last_offset)
was queued. When a barrier is in effect, we know that the disk
will pass through the barrier position just before the
"blocked bios" are released, so using the barrier's offset for
last_offset is the optimal choice.
sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
o Update last_offset in bioq_insert_tail().
o Only update last_offset in bioq_remove() if the removed bio is
at the head of the queue (typically due to a call via
bioq_takefirst()) and no barrier is active.
o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
set prev to the barrier and cur to it's next element. Now that
last_offset is kept at the barrier position, this change isn't
strictly necessary, but since we have to take a decision branch
anyway, it does avoid one, no-op, loop iteration in the while
loop that immediately follows.
o In bioq_disksort(), bypass the normal sort for bios with the
BIO_ORDERED attribute and instead insert them into the queue
with bioq_insert_tail(). bioq_insert_tail() not only gives
the desired command order during insertion, but also provides
barrier semantics so that commands disksorted in the future
cannot pass the just enqueued transaction.
sys/sys/bio.h:
Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
Use an ordered command for SCSI/ATA-NCQ commands issued in
response to bios with the BIO_ORDERED flag set.
sys/cam/scsi/scsi_da.c
Use an ordered tag when issuing a synchronize cache command.
Wrap some lines to 80 columns.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
Mark bios with the BIO_FLUSH command as BIO_ORDERED.
MFC 212293:
Store the full timestamp when caching timestamps of files and
directories for purposes of validating name cache entries. This
closes races where two updates to a file or directory within the same
second could result in stale entries in the name cache.
To preserve the ABI of 'struct nfsnode', the existing timestamp fields
are left with 'n_unusedX' placeholders along with the unused 'n_expiry'
field. The larger n_ctime and n_dmtime fields are added to the end of
the structure.
MFC 209907,212326,212368,212749:
- Provide more defines for PCI-Express device ctrl.
- Add register definitions related to extended capability IDs in
PCI-express. I used PCIZ_* for ID constants (plain capability IDs use
PCIY_*).
- Add register definitions for the Advanced Error Reporting, Virtual
Channels, and Device Serial Number extended capabilities.
- Teach pciconf -c to list extended as well as plain capabilities for
PCI-express devices. Adds more detailed parsing for AER, VC, and
device serial numbers.
MFC 212369:
- Use 'sta' to hold the PCIR_STATUS register value instead of 'cmd' when
walking the capability list.
- Use constants for PCI header types instead of magic numbers.
Correct logic bug in aicasm's undefined register bit access detection code.
The code in question verifies that all register write operations only change
bits that are defined (in the register definition file) for that effected
register. The bug effectively disabled this checking.
o Fix the check by testing the opcode against all supported read ("and" based)
operands.
o Add missing bit definitions to the aic7xxx and aic79xx register definition
files so that the warning (treated as a fatal error) does not spuriously
fire.
When using pf routing options, properly handle IP fragmentation
for interfaces with TSO enabled, otherwise one would see an extra
ICMP unreach, frag needed pre matching packet on lo0.
This syncs pf code to ip_output.c r162084.
Submitted by: yongari via mlaier
Reviewed by: eri
Tested by: kib
PR: kern/144311