* Don't call the _fast version of the TLS accessor in terminate() or
unexpected().
1) TLS may not have been set up yet.
2) When we're in one of these functions, Really Bad Stuff has
happened and potentially saving a few cycles really isn't
important.
* Merge in fixes from FreeBSD trunk to make atomics work with recent
clang.
syslogd: Use closefrom() instead of getdtablesize()/close() loop.
When syslogd forks a process for '|' destinations, it closes all file
descriptors greater than 2.
Use closefrom() for this instead of a getdtablesize()/close() loop because
it is both faster and avoids leaving file descriptors open because the limit
was lowered after they were opened.
MFprojects/camlock r249542:
Remove ADA_FLAG_PACK_INVALID flag. Since ATA disks have no concept of media
change it only duplicates CAM_PERIPH_INVALID flag, so we can use last one.
MFprojects/camlock r249541:
Give periph validity flag own periph reference. That slightly simplifies
the release logic and covers hypothetical case if lock is dropped inside
the periph_oninval() method.
Change maxio to reflect variable hardware configurations.
If max_sg_length is 0, then we default to 16
If max_sg_length is less than CISS_MAX_SG_ELEMENTS, then
we will set round the value of max_sg_length to the nearest
power of 2 and use it to align maxio.
Else, we will use CISS_MAX_SG_ELEMENTS for our calculations.
Thanks to scottl for working me through the history and providing
the basis for this patch.
Submitted by: scott
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
ed [Sat, 27 Apr 2013 05:44:39 +0000 (05:44 +0000)]
Remove references to MK_IDEA.
As of r249959, we want to build with IDEA support enabled
unconditionally. As this change removed the MK_IDEA flag, update these
Makefiles accordingly.
ed [Sat, 27 Apr 2013 04:56:02 +0000 (04:56 +0000)]
Unbreak <stdatomic.h> on ARM + Clang.
Clang only supports atomic operations for ARMv6. For non-ARMv6, we still
need to emit these functions.
Clang's prototype for these functions slightly differs, as it is truly
based on GCC's documentation. It requires the use of signed types, but
also requires varargs. Still, we are not allowed to simply implement
this function directly. Cleverly work around this by implementing it
under a different name and using __strong_reference().
Fix the frambuffer issues by calling pmap_mapdev() in the attach routine. This
will make the framebuffer region uncacheable and it will create a proper
KVA -> RAM mapping.
Remove the WITH_IDEA option and build it unconditionally.
The European version of the patent expired in 2011.
The US version of the patent expired in 2012 or prior.
Reviewed by: des
No objection from: cperciva, ehaupt
Properly sanitize --menu results (guards against Gtk library warnings from
X11 side of things from bleeding into Xdialog(1) stderr output). It should
be duely noted that such errors are not a by-product of anything in the
Xdialog(1) utility or API, but optional libraries that it can link against
(such as Gtk1 versus Gtk2; if you compile xdialog from ports against Gtk2
AND misconfigure your fonts or generally make Gtk2 unhappy, these warning
messages can bleed into the captured stderr -- that is we we sanitize!).
Fix examples for overriding INSTALL to not suggest hardcoding
'install' since it breaks buildworld after the introduction and
use of 'install -l' in r245752. Overriding INSTALL causes
/usr/bin/install to be used instead of the proper
/usr/src/tools/install.sh which handles the new flag.
According to devctl(4), clients must read events whole; they may not
piece them together from multiple reads(). It's as if /dev/devctl is
a datagram device instead of a stream device. However, devd's
internal buffer was too small (1025 bytes) to read an entire
ereport.fs.zfs.checksum event (variable, up to ~1300 bytes). This
commit enlarges the buffer to 8k.
Adjust the min comparison to look at the adjust value after subtraction, don't
subtract 1 from the chosen value if we are going to use the configued value.
Teach GEOM and CAM about the difference between the max "size" of r/w and delete
requests.
sys/geom/geom_disk.h:
- Added d_delmaxsize which represents the maximum size of individual
device delete requests in bytes. This can be used by devices to
inform geom of their size limitations regarding delete operations
which are generally different from the read / write limits as data
is not usually transferred from the host to physical device.
sys/geom/geom_disk.c:
- Use new d_delmaxsize to calculate the size of chunks passed through to
the underlying strategy during deletes instead of using read / write
optimised values. This defaults to d_maxsize if unset (0).
- Moved d_maxsize default up so it can be used to default d_delmaxsize
sys/cam/ata/ata_da.c:
- Added d_delmaxsize calculations for TRIM and CFA
sys/cam/scsi/scsi_da.c:
- Added re-calculation of d_delmaxsize whenever delete_method is set.
- Added kern.cam.da.X.delete_max sysctl which allows the max size for
delete requests to be limited. This is useful in preventing timeouts
on devices who's delete methods are slow. It should be noted that
this limit is reset then the device delete method is changed and
that it can only be lowered not increased from the device max.
Added available delete methods discovery during device probe, including the
maximum sizes for said methods, which are used when processing BIO_DELETE
requests. This includes updating UNMAP support discovery to be based on
SBC-3 T10/1799-D Revision 31 specification.
Added ATA TRIM support to cam scsi devices via ATA Pass-Through(16)
sys/cam/scsi/scsi_da.c:
- Added ATA Data Set Management TRIM support via ATA Pass-Through(16)
as a delete_method
- Added four new probe states used to identity available methods and their
limits for the processing of BIO_DELETE commands via both UNMAP and the
new ATA TRIM commands.
- Renamed Probe states to better indicate their use
- Added delete method descriptions used when informing user of issues.
- Added automatic calculation of the optimum delete mode based on which
method presents the largest maximum request size as this is most likely
to result in the best performance.
- Added WRITE SAME max block limits
- Updated UNMAP range generation to mirror that used by ATA TRIM, this
optimises the generation of ranges and fixes a potential overflow
issue in the count when combining multiple BIO_DELETE requests
- Added output of warnings about short deletes. This should only ever
be triggered on devices that fail to correctly advertise their supported
delete modes / max sizes.
- Fixed WS16 requests being incorrectly limited to 65535 in length.
Added a sysctl (kern.geom.dev.delete_max_sectors) to control the maximum
size of a delete request sent to the providing device performed by g_dev_ioctl.
This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA
deletes which siginificantly improves performance.
Previously this was hard coded to 65536 sectors, the new default is 262144
which doubles the throughput of deletes on commonly available SSD's.
In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput
from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via
newfs -E
For some SSD's where delete time is pretty much constant, no matter what
the request, setting this to 0 will provide significantly better throughput
e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s
Don't appease clang static analyzer after all and roll back
the free(3) of mntbuf ... again. There's no point in doing
useless extra work when we're about to exit.
Octeon 2 (6xxx) and newer CPUs don't use the clock CPU speed for its
I/O clock. Thankfully, the simple executive provies a way to querry
the proper clock that works on all models. Move to asking for the SCLK
via this interface.
This gets the serial console working after we start init and open the
console and set the divisor (which turned the output from good to
bad). I can login on the console now.
Use a thread for the processing of virtio tx descriptors rather
than blocking the vCPU thread. This improves bulk data performance
by ~30-40% and doesn't harm req/resp time for stock netperf runs.
Future work will use a thread pool rather than a thread per tx queue.
In the case where the controller supports an sg_list LESS than our predefined
and tuned value, we would advertise the unsupported value to CAM and it would
merrily destroy the controller with way too many IO operations.
This manifests itself in a Zero Memory RAID configuration for a P410 and
possibly other controllers.
Remove deprecated APIs to get the total and free memory available to vmm.ko.
These APIs were relevant when memory for virtual machine allocation was
hard partitioned away from the rest of the system but that is no longer
the case. The sysctls that provided this information were garbage collected
a while back.
Use the offsets from pcb.h rather than regnum.h to store the registers
in the pcb. setjmp/longjmp in the kernel also used these values, so
continue to use them although their use isn't technically the pcb
register array (matching is all that's important for setjmp/longjmp in
the kernel). Finally, eliminate the old register names from regnum.h.
This is a lexical change only. The non-debug .o files have the same md5.
Introduce a pointer to const variable gw, which points either at the
same place as dst, or to the sockaddr in the routing table.
The const constraint of gw makes us safe from modifing routing table
accidentially. And "onstantness" of dst allows us to remove several
bandaids, when we switched it back at &ro->ro_dst, now it always
points there.
revert r248644 because of the regression for usdt probes
USDT probes are advertised to kernel by initialization code with
atrribute((constructor))). It seems that on Solaris the .init-ish code
of the main object is executed before RD_PREINIT point is hit. On
FreeBSD that is not the case. And because on FreeBSD there is no other
well-defined point between RD_PREINIT and main() we have to parse a
DTrace script when main is hit, for time being.
A footnote: currently we actually post RD_POSTINIT event, but that's a
bug because the event is triggered by hitting r_debug_state which
happens before any init code is executed.
Add RIP-relative addressing to the instruction decoder.
Rework the guest register fetch code to allow the RIP to
be extracted from the VMCS while the kernel decoder is
functioning.
Move hptmv and mpt drivers shutdown a bit later to the SHUTDOWN_PRI_LAST
stage of shutdown_post_sync. That should allow CAM to do final cache flush
at the SHUTDOWN_PRI_DEFAULT without using polling magic.
This fixes the issue with the "randomly changing" default
route. What it was is there are two places in ip_output.c
where we do a goto again. One place was fine, it
copies out the new address and then resets dst = ro->rt_dst;
But the other place does *not* do that, which means earlier
when we found the gateway, we have dst pointing there
aka dst = ro->rt_gateway is done.. then we do a
goto again.. bam now we clobber the default route.
The fix is just to move the again so we are always
doing dst = &ro->rt_dst; in the again loop.
dim [Wed, 24 Apr 2013 17:20:45 +0000 (17:20 +0000)]
When rebooting (exiting) from the BTX loader, make sure to restore the
GDT from the correct segment, otherwise a triple fault would be caused.
In some virtual environments (VMware, VirtualBox, etc) this could lead
to a unhandled error or hang in the guest emulation software.
Thanks to avg and jhb for a few hints in the right direction.
Noticed by: Jeremy Chadwick <jdc@koitsu.org> (and many others)
MFC after: 1 week
andre [Wed, 24 Apr 2013 13:54:55 +0000 (13:54 +0000)]
Base the calculation of maxmbufmem in part on kmem_map size
instead of kernel_map size to prevent kernel memory exhaustion
by mbufs and a subsequent panic on physical page allocation
failure.
On architectures without a direct map all mbuf memory (except
for jumbo mbufs larger than PAGE_SIZE) comes from kmem_map.
It is the limiting factor hence.
For architectures with a direct map using the size of kmem_map
is a good proxy of available kernel memory as well. If it is
much smaller the mbuf limit may be sub-optimal but remains
reasonable, while avoiding panics under exhaustion.
The overall mbuf memory limit calculation may be reconsidered
again later, however due to the many different mbuf sizes and
different backing KVM maps it is a tricky subject.
Found by: pho's new network stress test
Pointed out by: alc (kmem_map instead of kernel_map)
Tested by: pho
dim [Tue, 23 Apr 2013 18:58:39 +0000 (18:58 +0000)]
Pull in r180121 from upstream llvm trunk:
LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make
sure that the order in which the elements are scalarized is the same
as the original order.
This fixes a miscompilation in FreeBSD's regex library.
This should fix lib/libc/regex/regcomp.c at -O3 with clang 3.3 r178860
on CPUs with SSE. Before this change, the vectorizer could incorrectly
rearrange the second loop in computejumps(), leading to possibly invalid
entries in the re_gets::charjump table.
The net result was that for example "sed s/@CC@/foo/" failed to work
correctly, leading to trouble with many configure scripts.
Return a lun count of 1 and a lun id of 0 when CAM attempts a REPORT_LUNS
command on a disk device. This quieseces some noise on the console that
recently appeared.
Teach the virtio block device to deal with direct as well as indirect
descriptors. Prior to this change the device would only work with guests
that chose to use indirect descriptors.
Modify the device reset callback to actually reset the device state.
Literally follow POSIX:
If the bs= expr operand is specified and no conversions other than sync,
noerror, or notrunc are requested, the data returned from each input
block shall be written as a separate output block.
In particular, when both bs=size and conv=sparce were specified, the
resulted file was fully filled, instead of sparce.