In g_eli_read_done() and g_eli_write_done(), for a bio with
bio_children > 1, g_destroy_bio() is never called and the bio
leaks. Fix this by calling g_destroy_bio() earlier, before the check.
Submitted by: Victor Balada Diaz <victor@bsdes.net> (initial version)
Use timeout from configuration file not only when sending and receiving,
but also when establishing connection.
r220007 (pjd):
Add mapsize to the header just before sending the packet.
Before it could change later and we were sending invalid mapsize.
Some time ago I added optimization where when nodes are connected for the
first time and there were no writes to them yet, there is no initial full
synchronization. This bug prevented it from working.
r220266 (pjd):
Handle the problem described in r220264 by using GEOM GATE queue of unlimited
length. This should fix deadlocks reported by HAST users.
r220270 (pjd):
Allow to disable sends or receives on a socket using shutdown(2) by
interpreting NULL 'data' argument passed to proto_common_send() or
proto_common_recv() as a will to do so.
r220271 (pjd):
Declare directions for sockets between primary and secondary.
In HAST we use two sockets - one for only sending the data and one for only
receiving the data.
r220272 (pjd):
When we are operating on blocking socket and get EAGAIN on send(2) or recv(2)
this means that request timed out. Translate the meaningless EAGAIN to
ETIMEDOUT to give administrator a hint that he might need to increase timeout
in configuration file.
r220273 (pjd):
Handle ENOBUFS on send(2) by retrying for a while and logging the problem.
r220274 (pjd):
Increase default timeout from 5 seconds to 20 seconds. 5 seconds is definitely
to short under heavy load and I was experiencing those timeouts in my recent
tests.
GEOM has an internal mechanism to deal with ENOMEM errors returned via
g_io_deliver(). In such case it increases 'pace' counter on each ENOMEM and
reschedules the request. The 'pace' counter is decreased for each request going
down, but until 'pace' is greater than zero, GEOM will handle at most 10
requests per second. For GEOM GATE users that are proxy to local GEOM providers
(like ggatel(8) and HAST) we can end up with almost permanent slow down of GEOM
down queue. This is because once we reach GEOM GATE queue limit, we return
ENOMEM to the GEOM. This means that we have, eg. 1024 I/O requests in the GEOM
GATE queue. To make room in the queue and stop returning ENOMEM we need to
proceed the requests of course, but those requests are handled by userland
daemons that handle them by reading/writing also from/to local GEOM providers.
For example with HAST, a new requests comes to /dev/hast/data, which is GEOM
GATE provider. GEOM GATE passes the request to hastd(8) and hastd(8)
reads/writes from/to /dev/da0. Once we reach GEOM GATE queue limit, to free up
a slot in GEOM GATE queue, hastd(8) has to read/write from/to /dev/da0, but
this request will also be very slow, because GEOM now slows down all the
requests. We end up with full queue that we can unload at the speed of 10
requests per second. This simply looks like a deadlock.
Fix it by allowing userland daemons that work with both GEOM GATE and local
GEOM providers to specify unlimited queue size, so GEOM GATE will never return
ENOMEM to the GEOM.
MFC 220382:
Correct 'list scan' description in the examples. The previous description
was incorrect - 'list scan' does not actually do a scan, but instead lists
the results of the background 'scan' cache.
Make `make tinderbox` work with MAKEOBJDIRPREFIX set (or in possibly other
combinations) by forcing FAILFILE into .CURDIR as we do for all other
universe output files. [1] Similarly make FAILFILE start with "_." as well.
Push a possible "unbind" in some situation from in6_pcbsetport() to
callers. This also fixes a problem when the prison call could set
the inp->in6p_laddr (laddr) and a following priv_check_cred() call
would return an error and will allow us to merge the IPv4 and IPv6
implementation.
Make sure the locally cached value of rt->rt_gateway stays stable,
even after dropping the reference and unlocking. Previously we
have dereferenced a NULL pointer (after r121765).
Simply unlocking after the block does not work either because of
lock ordering (see r121765) and in addition we would still hold
a pointer to something that might be gone by the time we access it.
Thus take a copy of the value rather than just caching the pointer.
For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it. LINT will
still build it.
While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases. In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.
Discussed with: qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR: kern/148018, kern/155604, kern/144917, kern/146792
MFC r220249,220252:
r220249:
64bit DMA caused data corruption. Unfortunately there is no known
workaround to use 64bit DMA.
Disable 64bit DMA on Attansic L1 controller.
- Remove bsdlabel.5
- Remove bsdlabel test-script that was full of broken assumptions
- Remove dead code depending on __alpha__
- Widen fields that display partition offset/length.
MFhead 220317:
When removing ifnets, we should first remove the reference to ifnet
from the interface index, then decrease refcount, not vice versa.
Otherwise there is a race (reproducible) when if_free_internal()
contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the
index for a long time. It may be picked by some other thread,
that runs ifnet_byindex_ref(), who takes the ifnet from index,
and bumps refcount. When reader drops the lock, if_free_internal()
proceeds with free. Then reader tries to free it a second time.
Improve locking of creating and dropping links in the graph, acquiring
the topology mutex in the following functions, that manipulate pointers
to peer nodes:
- ng_bypass()
- ng_path2noderef() when switching to the next node in sequence.
Rewrite the function a bit.
- ng_address_hook()
- ng_address_path()
This patch improves stability of large mpd5 installations.
Redo r166423. It is important not only skip freeing multicast
entires when underlying interface is detached, but also purge
pointers to them, to avoid double-free in future.
- add static and const where appropriate
- check pointers against NULL
- minor styling nits
- it is actually WARNS=6 clean for non-strict alignment platforms
MFC r220103:
Normally fxp(4) does not receive bad frames but promiscuous mode
makes controller to receive bad frames and i82557 will also receive
bad frames since fxp(4) have to receive VLAN oversized frames. If
fxp(4) encounter DMA overrun error, the received frame size would
be 0 so the actual frame size after checksum field extraction the
length would be negative(-2). Due to signed/unsigned comparison
used in driver, frame length check did not work for DMA overrun
frames. Correct this by casting it to int.
While I'm here explicitly check DMA overrun error and discard the
frame regardless of result of received frame length check.
MFC r219845, r219930, r219949 and r219983.
- Use software to compute EHCI data toggle instead of hardware.
- Fix EHCI initialisation order with regard to debug prints.
MFC r219048, r219004, r218475 and r204632.
- The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.
- use device_printf() instead of printf() to give more accurate warnings.
- use memcpy() instead of bcopy().
- add missing #if's for non-FreeBSD compilation.
- Add missing xhci(4) manual page.
- Minor update in some USB manual pages.
- Correct USB 3.0 wire-speed to 5.0Gbps
In g_gate_create() there is a window between when g_gate_softc is
registered in g_gate_units array and when its sc_provider field is
filled. If during this period g_gate_units is accessed by another
thread that is checking for provider name collision the crash is
possible.
Fix this by adding sc_name field to struct g_gate_softc. In
g_gate_create() when g_gate_softc is created but sc_provider is still
not sc_name points to provider name stored in the local array.
Reported by: Freddie Cash <fjwcash@gmail.com>
r220173:
Increase debug level on g_gate device destruction and add message on
device creation.
Change for Africa/Casablanca:
- The 3rd april 2011 at 00:00:00, [it] will be 3rd april 1:00:00
- The 31th july 2011 at 00:59:59, [it] will be 31th July 00:00:00
Update for SouthAmerica/Chili:
- Chile's clocks will go back an hour this year on the 7th of May instead
of this Saturday. They will go forward again the 3rd Saturday in
August, not in October as they have since 1968. This is a pilot plan
which will be reevaluated in 2012.
ktrace_resize_pool() locking slightly reworked:
1) do not take a lock around the single atomic operation.
2) do not lose the invariant of lock by dropping/acquiring
ktrace_mtx around free() or malloc().
MFC r219042:
Introduce preliminary support of the show description of the ABI of
traced process by adding two new events which records value of process
sv_flags to the trace file at process creation/execing/exiting time.
MFC r219311:
Partially rework r219042.
The reason for this is a bug at ktrops() where process dereferenced
without having a lock. This might cause a panic if ktrace was runned
with -p flag and the specified process exited between the dropping
a lock and writing sv_flags.
Since it is impossible to acquire sx lock while holding mtx switch
to use asynchronous enqueuerequest() instead of writerequest().
Rename ktr_getrequest_ne() to more understandable name.
MFC r219312:
Fix indentation in comment, double ';' in variable declaration.
edwin [Thu, 31 Mar 2011 06:29:15 +0000 (06:29 +0000)]
MFC of 198254, 198255, 198350, 198267, 209190, 208831, 208830, 210243
198254:
When tzsetup is run as non-root and the "CMOS clock question on
UTC" is answered as No, it would abort without properly ending the
dialog session.
198255:
Make the usage of the default zoneinfo file to install clearer.
198350:
- Add support for chrooted installs.
- Add examples to the man-page.
198267:
Instead of having to know which timezone was picked last time, you
now can run "tzsetup -r" which will reinstall the last choice. This
data is recorded in /var/db/zoneinfo.
209190:
Use literal format strings. Found by clang.
208831:
Add comment that this value is unused.
It is obvious that it isn't used, but both clang and Coverity talk about it.
208830:
When there is a problem with writing, also bail out.
Allow to checksum on-the-wire data using either CRC32 or SHA256.
r219354 (pjd):
Allow to compress on-the-wire data using two algorithms:
- HOLE - it simply turns all-zero blocks into few bytes header;
it is extremely fast, so it is turned on by default;
it is mostly intended to speed up initial synchronization
where we expect many zeros;
- LZF - very fast algorithm by Marc Alexander Lehmann, which shows
very decent compression ratio and has BSD license.
r219369 (pjd):
Provides three states for pjdlog_initialized, so we can also tell that
this is fist initialization ever.
r219370 (pjd), r219385 (pjd):
- Turn on printf extentions.
- Load support for %T for pritning time.
- Add support for %N for printing number in human readable form.
- Add support for %S for printing sockaddr structure (currently only AF_INET
family is supported, as this is all we need in HAST).
- Disable gcc compile-time format checking as this will no longer work.
r219371 (pjd):
Use %S to print IP address and port number.
r219372 (pjd):
- Log size of data to synchronize in human readable form (using %N).
- Log synchronization time (using %T).
- Log synchronization speed in human readable form (using %N).
r219373 (pjd):
Print some of the numbers in human readable form (using %N).
r219482:
Make workers inherit debug level from the main process.
r219620 (pjd):
In command line options allow size to be specified using k/M/G/T
suffixes.
r219669 (pjd):
Remove #include needed for debugging.
r219721:
For secondary, set 2 * HAST_KEEPALIVE seconds timeout for incoming
connection so the worker will exit if it does not receive packets from
the primary during this interval.
Reported by: Christian Vogt <Christian.Vogt@haw-hamburg.de>
Tested by: Christian Vogt <Christian.Vogt@haw-hamburg.de>
r219813 (pjd):
If there is any traffic on one of out descriptors, we were not checking for
long running hooks. Fix it by not using select(2) timeout to decide if we want
to check hooks or not.
r219814 (pjd):
When creating connection on behalf of primary worker, set pjdlog prefix
to resource name and role, so that any logs related to that can be identified
properly.
r219815 (pjd):
Add snprlcat() and vsnprlcat() - the functions I'm always missing.
They work as a combination of snprintf(3) and strlcat(3) - the caller
can append a string build based on the given format.
r219816 (pjd):
Use snprlcat() instead of two strlcat(3)s.
r219817 (pjd):
Log when we start hooks checking and when we execute a hook.
r219818 (pjd), r219821 (pjd):
In hast.conf we define the other node's address in 'remote' variable.
This way we know how to connect to secondary node when we are primary.
The same variable is used by the secondary node - it only accepts
connections from the address stored in 'remote' variable.
In cluster configurations it is common that each node has its individual
IP address and there is one addtional shared IP address which is assigned
to primary node. It seems it is possible that if the shared IP address is
from the same network as the individual IP address it might be choosen by
the kernel as a source address for connection with the secondary node.
Such connection will be rejected by secondary, as it doesn't come from
primary node individual IP.
Add 'source' variable that allows to specify source IP address we want to
bind to before connecting to the secondary node.
r219821 (pjd):
Forgot to commit this as a part of r219818.
r219830 (pjd):
Detect situation where resource internal identifier differs.
This means that both nodes have separately managed resources that don't
have the same data.
r219831 (pjd):
Be pedantic and free nvout before exiting.
r219832 (pjd):
Increase debug level of "Checking hooks." message.
r219833 (pjd):
Remove stale comment. Yes, it is valid to set role back to init.
r219837 (pjd):
Before handling any events on descriptors check signals so we can update
our info about worker processes if any of them was terminated in the meantime.
This fixes the problem with 'hastctl status' running from a hook called on
split-brain:
1. Secondary calls a hooks and terminates.
2. Hook asks for resource status via 'hastctl status'.
3. The main hastd handles the status request by sending it to the secondary
worker who is already dead, but because signals weren't checked yet he
doesn't know that and we get EPIPE.
r219843 (pjd):
Fix typo.
r219844 (pjd):
Initialize localcnt on first write. This fixes assertion when we create
resource, set role to primary, do no writes, then sent it to secondary
and accept connection from primary.
r219864 (pjd):
White space cleanups.
r219873 (pjd), r219873 (pjd):
The proto API is a general purpose API, so don't use 'hast' in structures or
function names. It can now be used outside of HAST.
r219879:
For requests that are sent only to remote component use the
error from remote.
r219882:
After synchronization is complete we should make primary counters be
equal to secondary counters:
trociny [Tue, 29 Mar 2011 17:59:30 +0000 (17:59 +0000)]
MFC r219368:
r219368 (pjd):
To be able to use printf extensions we need to turn off gcc format checking.
Following the convention of NO_WERROR and NO_WCAST_ALIGN add NO_WFORMAT,
which, when defined in Makefile, turns off compile-time format checking
(by adding -Wno-format), but still allows to use high WARNS level.
trociny [Tue, 29 Mar 2011 17:52:45 +0000 (17:52 +0000)]
MFC r219342, r219346:
r219342 (pjd):
Fix various issues in how %#T is handled:
- If precision is 0, don't print period followed by no digits.
- If precision is 0 stop printing units as soon as possible
(eg. if we have three years and five days and precision is 0
print only 3y5d).
- If precision is not 0, print all units (eg. 3y0d0h0m0s.00).
r219346 (pjd):
Because we call __printf_out() with a on-stack buffer, also call
__printf_flush() so we are sure it won't be referenced after we return.
Use more consistent function name with the others (pjdlogv_prefix_set()
instead of pjdlog_prefix_setv()).
r217732 (pjd):
Add nv_assert() which allows to assert that the given name exists.
r217737 (pjd):
Add missing logs.
r217784 (pjd):
Don't open configuration file from worker process. Handle SIGHUP in the
master process only and pass changes to the worker processes over control
socket. This removes access to global namespace in preparation for capsicum
sandboxing.
r217958 (pjd):
Remove __dead2 from pjdlog_verify() prototype, it does return sometimes.
r217961 (pjd):
- Remove obvious NOTREACHED comment after abort() call.
- Remove redundant newline at the end of the file.
r217962 (pjd):
Add LOG_NDELAY flag to openlog(3) - we want descriptor to be immediately open
so there are no surprises once we start chrooting or using capsicum.
r217964 (pjd):
Use pjd copyright for 2011 work.
r217965 (pjd):
Add functions to initialize/finalize pjdlog. This allows to open/close log
file at will.
r217966 (pjd):
Extend pjdlog_verify() to support the following additional macros:
PJDLOG_RVERIFY() - always check expression and on false log the given message
and exit.
PJDLOG_RASSERT() - check expression when NDEBUG is not defined and on false log
given message and exit.
PJDLOG_ABORT() - log the given message and exit.
r217967 (pjd):
Close the control socket before exiting, so it will be unlinked.
r217969 (pjd):
Remember created control connection so on fork(2) we can close it in child.
r218040 (pjd):
Initialize all global variables on pjdlog_init().
r218041 (pjd):
Add function to close all unneeded descriptors after fork(2).
r218042 (pjd):
Add comments to places where we treat errors as ciritical, but it is possible
to handle them more gracefully.
r218043 (pjd):
Close all unneeded descriptors after fork(2).
r218044 (pjd):
Add function to assert that the only descriptors we have open are the ones
we expect to be open. Also assert that they point at expected type.
Because openlog(3) API is unable to tell us descriptor number it is using, we
have to close syslog socket, remember assert message in local buffer and if we
fail on assertion, reopen syslog socket and log the message.
r218045 (pjd):
Use newly added descriptors_assert() function to ensure only expected
descriptors are open.
r218046 (pjd), r218047 (pjd), r218119 (maxim):
Add 'hast' user and 'hast' group that will be used by hastd (and maybe hastctl)
to drop privileges.
r218048 (pjd):
Implement function that drops privileges by:
- chrooting to /var/empty (user hast home directory),
- setting groups to 'hast' (user hast primary group),
- setting real group id, effective group id and saved group id to 'hast',
- setting real user id, effective user id and saved user id to 'hast'.
At the end verify that those operations where successfull.
r218049 (pjd):
Drop privileges in worker processes.
Accepting connections and handshaking in secondary is still done before
dropping privileges. It should be implemented by only accepting connections in
privileged main process and passing connection descriptors to the worker, but
is not implemented yet.
r218132 (pjd):
Rename pjdlog_verify() to pjdlog_abort() as it better describes what the
the function does and mark it with __dead2.
r218138 (pjd):
- Use pjdlog for assertions and aborts as this will log assert/abort message
to syslog if we run in background.
- Asserts in proto.c that method we want to call is implemented and remove
dummy methods from protocols implementation that are only there to abort
the program with nice message.
r218139 (pjd):
Implement two new functions for sending descriptor and receving descriptor
over UNIX domain sockets and socket pairs.
This is in preparation for capsicum.
r218147 (pjd), r218148 (pjd):
Fix build on ia64.
r218158 (pjd):
Do not set socket send and receive buffer. It will be auto-tuned.
Confirmed by: rwatson
r218185 (pjd):
Be prepared that hp_client or hp_server might be NULL now.
r218191 (pjd):
Move protocol allocation and deallocation to separate functions.
r218192 (pjd), r218201 (bz):
Allow to specify connection timeout by the caller.
r218193 (pjd):
Add proto_connect_wait() to wait for connection to finish.
If timeout argument to proto_connect() is -1, then the caller needs to use
this new function to wait for connection.
This change is in preparation for capsicum, where sandboxed worker wants
to ask main process to connect in worker's behalf and pass descriptor
to the worker. Because we don't want the main process to wait for the
connection, it will start async connection and pass descriptor to the
worker who will be responsible for waiting for the connection to finish.
r218194 (pjd):
- Rename proto_descriptor_{send,recv}() functions to
proto_connection_{send,recv} and change them to return proto_conn
structure. We don't operate directly on descriptors, but on
proto_conns.
- Add wrap method to wrap descriptor with proto_conn.
- Remove methods to send and receive descriptors and implement this
functionality as additional argument to send and receive methods.
r218214 (pjd):
Let the caller log info about successful privilege drop.
We don't want to log this in hastctl.
r218215 (pjd):
Drop privileges after connecting to hastd, but before sending or receiving
anything.
r218217 (pjd):
Add missing locking after moving keepalive_send() to remote send thread
in r214692.
r218218 (pjd):
Setup another socketpair between parent and child, so that primary sandboxed
worker can ask the main privileged process to connect in worker's behalf
and then we can migrate descriptor using this socketpair to worker.
This is not really needed now, but will be needed once we start to use
capsicum for sandboxing.
r218370 (pjd):
Close more descriptors that can be open if the worker process for the given
resource is already running.
Add (void) cast before snprintf(3)s for which we are not interested in return
values.
r218376 (pjd):
Now that we break the loop on fstat(2) failure we no longer need to satisfy
gcc's imperfections.
r218464 (pjd):
Unlink UNIX domain socket file only if:
1. The descriptor is the one we are listening on (not the one when we connect
as a client and not the one which is created on accept(2)).
2. Descriptor was created by us (PID matches with the PID stored on bind(2)).
yongari [Mon, 28 Mar 2011 00:13:41 +0000 (00:13 +0000)]
MFC r219701:
Remove too expensive bus_dmamap_sync(9) call in dc_rx_resync().
With this change, driver may not notice updated descriptor status
change when bounce buffers are active. However, rxeof() in next run
will handle the synchronization.
Change dc_rxeof() a bit to return the number of processed frames in
RX descriptor ring. Previously it returned the number of frames
that were successfully passed to upper stack which in turn means it
ignored frames that were discarded due to errors. The number of
processed frames in RX descriptor ring is used to detect whether
driver is out of sync with controller's current descriptor pointer.
Returning number of processed frames reduces unnecessary (probably
wrong) re-synchronization.
yongari [Sun, 27 Mar 2011 23:13:02 +0000 (23:13 +0000)]
MFC r219407:
Rearrange dc_tx_underrun() a bit to correctly set TX FIFO threshold
value. Controllers that always require "store and forward" mode(
Davicom and PNIC 82C168) have no way to recover from TX underrun
except completely reinitializing hardware. Previously only Davicom
was reinitialized and the TX FIFO threshold was changed not to use
"store and forward" mode after reinitialization since the default
FIFO threshold value was 0. This effectively disabled Davicom
controller's "store and forward" mode once it encountered TX
underruns. In theory, this can cause watchodg timeouts.
Intel 21143 controller requires TX MAC should be idle before
changing TX FIFO threshold. So driver tried to disable TX MAC and
checked whether it saw the idle state of TX MAC. Driver should
perform full hardware reinitialization on failing to enter to idle
state and it should not touch TX MAC again once it performed full
reinitialization.
While I'm here remove resetting TX FIFO threshold to 0 when
interface is put into down state. If driver ever encountered TX
underrun, it's likely to trigger TX underrun again whenever
interface is brought to up again. Keeping old/learned TX FIFO
threshold value shall reduce the chance of seeing TX underrns in
next run.