Free mbuf in case when protocol in unknown in ng_ipfw_rcvdata().
This change fixes (theoretically) possible mbuf leak introduced in
r225586. Reorder code a bit and change return codes to be more specific
- Add ipfw eXtended tables permitting radix to be used for any kind of keys.
- Add support for IPv6 and interface extended tables
- Make number of tables to be changed in runtime in range 0..65534.
- Use IP_FW3 opcode for all new extended table cmds
No ABI changes are introduced. Old userland will see valid tables for
IPv4 tables and no entries otherwise. Flush works for any table.
IP_FW3 socket option is used to encapsulate all new opcodes:
/* IP_FW3 header/opcodes */
typedef struct _ip_fw3_opheader {
uint16_t opcode; /* Operation opcode */
uint16_t reserved[3]; /* Align to 64-bit boundary */
} ip_fw3_opheader;
New opcodes added:
IP_FW_TABLE_XADD, IP_FW_TABLE_XDEL, IP_FW_TABLE_XGETSIZE, IP_FW_TABLE_XLIST
ipfw(8) table argument parsing behavior is changed:
'ipfw table 999 add some-unqualified-host' now assumes
'some-unqualified-host' to be interface name instead of hostname.
New tunable:
net.inet.ip.fw.tables_max controls number of table supported by ipfw in given
VNET instance. 128 is still the default value.
Sysctl change:
net.inet.ip.fw.tables_max is now read-write.
New syntax:
ipfw add skipto tablearg ip from any to any via table(42) in
ipfw add skipto tablearg ip from any to any via table(4242) out
This is a bit hackish, special interface name '\1' is used to signal interface
table number is passed in p.glob field.
MFC r234121:
Back out r228476.
r228476 fixed superfluous link UP/DOWN messages but broke IPMI
access during boot. It's not clear why r228476 breaks IPMI and
should be revisited.
Reported by: Paul Guyot <paulguyot <> ieee dot org >
MFC r233387:
Use suspend/resume methods provided by net80211. This ensures that the
appropriate state handling takes place, not doing so results in the
device doing nothing until manual intervention.
Improve the way to calculate available pages in tmpfs:
- Don't deduct wired pages from total usable counts because it does not
make any sense. To make things worse, on systems where swap size is
smaller than physical memory and use a lot of wired pages (e.g. ZFS),
tmpfs can suddenly have free space of 0 because of this;
- Count cached pages as available; [1]
- Don't count inactive pages as available, technically we could but that
might be too aggressive; [1]
Don astbestos garment and remove the warning about TMPFS being experimental
-- highly experimental even. So far the closest to a bug in TMPFS that people
have gotten to relates to how ZFS can take away from the memory that TMPFS
needs. One can argue that such is not a bug in TMPFS. Irrespective, even if
there is a bug here and there in TMPFS, it's not in our own advantage to
scare people away from using TMPFS. I for one have been using it, even with
ZFS, very successfully.
Change SIGUSR1 to SIGTHR to properly wake up a process that is being
traced. The use of SIGUSR1 caused traced processes (those attached to
with dtrace -p) to exit when dtrace exited.
marius [Fri, 20 Apr 2012 14:45:57 +0000 (14:45 +0000)]
MFC: r222475
Fix read_ivar implementation for MMC and SD.
1. Both mmc_read_ivar() and sdhci_read_ivar() use the expression
'*(int *)result = val' to assign to result which is uintptr_t *.
This does not work on big-endian 64 bit systems.
2. The media_size ivar is declared as 'off_t' which does not fit
into uintptr_t in 32bit systems, change this to long.
Submitted by: kanthms at netlogicmicro com (initial version)
Export vinactive() from kern/vfs_subr.c (e.g., make it no longer
static and declare its prototype in sys/vnode.h) so that it can be
called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c)
instead of the body of vinactive() being cut and pasted into
process_deferred_inactive().
MFC r234172:
Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.
Use the flag to fix sigsuspend(2) error return ktrace records.
MFC r233176:
Add new GEOM_PART_LDM module that implements the Logical Disk Manager
scheme. The LDM is a logical volume manager for MS Windows NT and it
is also known as dynamic volumes. It supports about 2000 partitions
and also provides the capability for software RAID implementations.
This version implements only partitioning scheme capability and based
on the linux-ntfs project documentation and several publications across
the Web. NOTE: JBOD, RAID0 and RAID5 volumes aren't supported.
An access to the LDM metadata is read-only. When LDM is on the disk
partitioned with MBR we can also destroy metadata. For the GPT
partitioned disks destroy action is not supported.
MFC r233177:
Connect geom_part_ldm module to the build.
MFC r233178:
Connect geom_part_ldm to the kernel build.
MFC r233181:
Add CTLFLAG_TUN to sysctls.
MFC r233651:
Do proper cleanup for the GPT case when an error occurs.
MFC r233652:
VMDB offset should be greater than logical volume size only for MBR.
Some software think a mutex can be destroyed after it owned it, for
example, it uses a serialization point like following:
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
pthread_mutex_destroy(&muetx);
They think a previous lock holder should have already left the mutex and
is no longer referencing it, so they destroy it. To be maximum compatible
with such code, we use IA64 version to unlock the mutex in kernel, remove
the two steps unlocking code.
MFC r232799:
- add comments to syscalls.master and linux(32)_dummy about which linux
kernel version introduced the sysctl (based upon a linux man-page)
- add comments to syscalls.master regarding some names of sysctls which are
different than the linux-names (based upon the linux unistd.h)
- add some dummy sysctls
- name an unimplemented sysctl
If hastd is invoked with "-P pidfile" option always create pidfile
regardless of whether -F (foreground) option is set or not.
Also, if -P option is specified, ignore pidfile setting from configuration
not only on start but on reload too. This fixes the issue when for hastd
run with -P option reload caused the pidfile change.
MFC r233000:
Add MODULE_DEPEND() to geom_part modules.
MFC r233342:
Check that scheme is not already registered. This may happens when a
KLD is preloaded with loader(8) and leads to infinity loop.
Also do not return EEXIST error code from MOD_LOAD handler, because
we have undocumented(?) ability replace kernel's module with preloaded one.
And if we have so, then preloaded module will be initialized first.
Thus error in MOD_LOAD handler will be triggered for the kernel.
MFC 233670,233671:
- Use VM_MEMATTR_UNCACHEABLE for the constant for UC memory rather than
VM_MEMATTR_UNCACHED on mips.
- Rename VM_MEMATTR_UNCACHED to VM_MEMATTR_WEAK_UNCACHEABLE on x86 to
be less ambiguous and more clearly identify what it means. An alias
from VM_MEMATTR_WEAK_UNCACHEABLE to VM_MEMATTR_WEAK_UNCACHED remains
on x86 to preserve the KPI.
- Remove the VM_MEMATTR_UNCACHED alias from powerpc.
MFC 219737,219740,233676:
Improve handling of MSI interrupts with HyperTransport devices:
- Always enable the HyperTransport MSI mapping window for HyperTransport
to PCI bridges (these show up as HyperTransport slave devices) on
powerpc.
- Enable the HyperTransport MSI mapping window for HyperTransport to PCI
bridges when an interrupt is configured via the PCIB_MAP_MSI() call
on x86.
MFC 232228,233613:
Move the DTrace return IDT vector back up from 0x20 to 0x92. The 0x20
vector is currently dedicated to servicing IRQ 0 from the 8259A's, so
it shouldn't be overloaded for DTrace.
MFC 232742:
MFamd64:
- Return failure for a suspend attempt if we have no wake address.
- Use intr_disable()/intr_restore() instead of ACPI_DISABLE_IRQS().
- Invoke intr_suspend() earlier and call intr_resume() if suspend
fails.
- Use pause in the loop waiting for CPU to suspend.
- Restore PAT MSR, switchtime, switchticks, and MTRRs on resume.
MFC r233688-233689:
r233688:
Remove task queue based link state change handler. Driver no longer
needs to defer link state handling.
While I'm here, mark IFF_DRV_RUNNING before changing media. If
link is established without any delay, that link state change
handling could be lost.
r233689:
Do not report current link status if driver is not running.
This change also workarounds dhclient's link state handling bug by
not giving current link status.
Unlike other controllers, ale(4)'s PHY hibernation perfectly works
such that driver does not see a valid link if the controller is not
brought up. If dhclient(8) runs on ale(4) it will blindly waits
until link UP and then gives up after 10 seconds. Because
dhclient(8) still thinks interface got a valid link when IFM_AVALID
is not set for selected media, this change makes dhclient initiate
DHCP without waiting for link UP.
MFC r233585-233587:
r233585:
Partially revert r223608 and selectively allow microcode loading
for 82550C. For 82550 controllers this change restores CPUSaver
microcode loading. Due to silicon bug on 82550 and 82550C with
server extension, these controllers seem to require CPUSaver
microcode to receive fragmented UDP datagrams. However the
microcode shouldn't be used on client featured 82550C as it locks
up the controller. In addition, client featured 82550C does not
have the silicon bug. Also clear temporary memory used for
microcode loading since the same memory area is used for other
commands.
While I'm here use 82550C in probe message instead of generic
82550.
Reported by: Andreas Longwitz <longwitz <> incore de>
Tested by: Andreas Longwitz <longwitz <> incore de>
r233586:
Load entire EEPROM contents in device attach time and verify
whether the checksum of EEPROM is valid or not. Because driver
heavily relies on EEPROM information when it selectively enables
features/workarounds, it would be helpful to know whether driver
sees valid EEPROM.
While I'm here remove all other EEPROM accesses since the entire
EEPROM is loaded at device attach time.
r233587:
Remove unnecessary #if as the software workaround for PCI protocol
violation should be activated unless the system is cold-booted
after updating EEPROM.
The PCI protocol violation happens only when established link is
10Mbps so the workaround should be updated whenever link state
change is detected. Previously the workaround was activated only
when user checks current media status with ifconfig(8).
MFC r233100:
In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag
if the filesystem performed short write and we are skipping the page
due to this.
Propogate write error from the pager back to the callers of
vm_pageout_flush(). Report the failure to write a page from the
requested range as the FALSE return value from vm_object_page_clean(),
and propagate it back to msync(2) to return EIO to usermode.
While there, convert the clearobjflags variable in the
vm_object_page_clean() and arguments of the helper functions to
boolean.
Properly mask off bits that are not supported in the IAP counters.
This fixes a bug where users would see massively large counts, near
to 2**64 -1, due to the bits not being cleared.
MFC: r2323467
The name caching changes of r230394 exposed an intermittent bug
in the new NFS server for NFSv4, where it would report ENOENT
when the file actually existed on the server. This turned out
to be caused by not initializing ni_topdir before calling lookup()
and there was a rare case where the value on the stack location
assigned to ni_topdir happened to be a pointer to a ".." entry,
such that "dp == ndp->ni_topdir" succeeded in lookup().
This patch initializes ni_topdir to fix the problem.
MFC: r232050
hrs@ reported a panic to freebsd-stable@ under the subject line
"panic in 8.3-PRERELEASE" on Feb. 22, 2012. This panic was caused
by use of a mix of tsleep() and msleep() calls on the same event
in the new NFS server DRC code. It did "mtx_unlock(); tsleep();"
in two places, which kib@ noted introduced a slight risk that the
wakeup() would occur before the tsleep(), resulting in a 10sec
delay before waking up. This patch fixes the problem by replacing
"mtx_unlock(); tsleep();" with mtx_sleep(..PDROP..). It also
changes a nfsmsleep() call to mtx_sleep() so that the code uses
mtx_sleep() consistently within the file.
marius [Sat, 7 Apr 2012 12:46:53 +0000 (12:46 +0000)]
MFC: r233827
Fix probing of SAS1068E with a device ID of 0x0059 after r232411 (MFC'ed
to stable/8 in r232563).
Reported by: infofarmer
MFC: r233886
Refine r233827; as it turns out, controllers with a device ID of 0x0059
can be upgraded to MegaRAID mode, in which case mfi(4) should attach to
these based on the sub-vendor and -device ID instead (not currently done).
Therefore, let mpt_pci_probe() return BUS_PROBE_LOW_PRIORITY.
While it, let mpt_pci_probe() return BUS_PROBE_DEFAULT instead of 0 in
the default case.
On start most of sysctl_kern_proc functions use the same pattern:
locate a process calling pfind() and do some additional checks like
p_candebug(). To reduce this code duplication a new function pget() is
introduced and used.
As the function may be useful not only in kern_proc.c it is in the
kernel name space.
marius [Wed, 4 Apr 2012 21:19:24 +0000 (21:19 +0000)]
MFC: r233747, r233748
- Fix panic on kernel traps having a mapping in trap_sig b0rked in r206086
(MFC'ed to stable/8 in r206198).
Reported by: David E. Cross
- Remove checks that are redundant due to tf_type being unsigned.
MFC r232165:
Introduce the "ruleset=number" option for devfs(5) mounts.
Add support for updating the devfs mount (currently only changing the
ruleset number is supported).
Check mnt_optnew with vfs_filteropt(9).
This new option sets the specified ruleset number as the active ruleset
of the new devfs mount and applies all its rules at mount time. If the
specified ruleset doesn't exist, a new empty ruleset is created.
MFC r232247 (partial):
mdoc(7) type - start new sentences on new line
MFC r232307:
Add "export" to devfs_opts[] and return EOPNOTSUPP if called with it.
Fixes mountd warnings.
MFC r230397 (pjd):
By default turn off prefetch when listing snapshots.
In my tests it makes listing snapshots 19% faster with cold cache and
47% faster with warm cache.
MFC r230438 (pjd):
Dramatically optimize listing snapshots when user requests only snapshot
names and wants to sort them by name, ie. when executes:
# zfs list -t snapshot -o name -s name
Because only name is needed we don't have to read all snapshot properties.
Below you can find how long does it take to list 34509 snapshots from a single
disk pool before and after this change with cold and warm cache:
before:
# time zfs list -t snapshot -o name -s name > /dev/null
cold cache: 525s
warm cache: 218s
after:
# time zfs list -t snapshot -o name -s name > /dev/null
cold cache: 1.7s
warm cache: 1.1s
MFC r232064:
Import illumos changeset 13608 [1]:
add support for "-t <datatype>" argument to zfs get
References:
https://www.illumos.org/issues/1936
Update zfs(8) manpage in respect of [1].
Fix typo in zfs(8) manpage.
MFC 232700:
Add a new sched_clear_name() method to the scheduler interface to clear
the cached name used for KTR_SCHED traces when a thread's name changes.
This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's
name changes, most commonly via execve().
marius [Mon, 2 Apr 2012 20:14:40 +0000 (20:14 +0000)]
MFC: r233701
- Remove erroneous trailing semicolon. [1]
- Correctly determine the maximum payload size for setting the TX link
frequent NACK latency and replay timer thresholds.
Correct failure to attach the PV block front device on Citrix
XenServer configurations that advertise the multi-page ring extension,
but only allow a single page of ring space.
sys/dev/xen/blkfront/blkfront.c:
If only one page of ring space is being used, do not publish
in the XenStore the number of pages in use (1), via either
of the supported multi-page ring extension schemes.
Single page operation is the same with or without the
ring-page extension being negotiated. Relying on the
legacy behavior avoids an incompatible difference in how
the two ring-page extension schemes that are out in the
wild deal with the base case of a single page. The
Amazon/Red Hat drivers use the same XenStore variable as
if the extension was not negotiated. The Citrix drivers
assume the new ring reference XenStore variables will be
available.
Reported by: Oliver Schonefeld <schonefeld@ids-mannheim.de>
MFC r233101:
Add sysctl vfs.nfs.nfs_keep_dirty_on_error to switch the nfs client
behaviour on error from write RPC back to behaviour of old nfs client.
When set to not zero, the pages for which write failed are kept dirty.
Change the notes about the pidfile to include Doug's preference
for pre-creating the pidfile with appropriate owner and permissions.
Requested by dougb
r231909:
The pidfile_open(3) is going to be fixed to set close-on-exec in order
not to leak the descriptor after exec(3). This raises the issue for
daemon(3) of the pidfile lock to be lost when the child process
executes.
To solve this and also to have the pidfile cleaned up when the program
exits, if a pidfile is specified, spawn a child to exec the command
and wait in the parent keeping the pidfile locked until the child
process exits and remove the file.
If the supervising process receives SIGTERM, forward it to the spawned
process. Normally it will cause the child to exit followed by the
termination of the supervisor after removing the pidfile.
This looks like desirable behavior, because termination of a
supervisor usually supposes termination of its charge. Also it will
fix the issue with stale pid files after reboot due to init kills a
supervisor before its child exits.
r231911:
Add -r option to restart the program if it has been terminated.
Suggested by: Andrey Zonov <andrey zonov org>
r231912:
If permitted protect the supervisor against pageout kill.
marius [Sat, 31 Mar 2012 10:47:40 +0000 (10:47 +0000)]
MFC: r233427
Add cas(4), gem(4) and hme(4) to x86 GENERICs as suggested by netchild@ in
<20120222095239.Horde.0hpYHJjmRSRPRKzXsoFRbYk@webmail.leidinger.net>.
According to some private emails received, it apparently is not unpopular
to use at least Quad GigaSwift cards driven by cas(4) in x86 machines.