2 .\" Copyright (c) 1999 Kenneth D. Merry.
3 .\" All rights reserved.
5 .\" Redistribution and use in source and binary forms, with or without
6 .\" modification, are permitted provided that the following conditions
8 .\" 1. Redistributions of source code must retain the above copyright
9 .\" notice, this list of conditions and the following disclaimer.
10 .\" 2. The name of the author may not be used to endorse or promote products
11 .\" derived from this software without specific prior written permission.
13 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32 .Nd generic PCI/PCIe bus driver
34 To compile the PCI bus driver into the kernel,
35 place the following line in your
36 kernel configuration file:
37 .Bd -ragged -offset indent
41 To compile in support for Single Root I/O Virtualization
43 .Bd -ragged -offset indent
47 To compile in support for native PCI-express HotPlug:
48 .Bd -ragged -offset indent
54 driver provides support for
58 devices in the kernel and limited access to
66 character device that can be used by userland programs to read and write
68 configuration registers.
69 Programs can also use this device to get a list of all
73 devices that match various patterns.
77 driver provides a write interface for
79 configuration registers, system administrators should exercise caution when
80 granting access to the
83 If used improperly, this driver can allow userland applications to
84 crash a machine or cause data loss.
85 In particular, driver only allows operations on the opened
87 to modify system state if the file descriptor was opened for writing.
92 operations require a writeable descriptor, because reading a config register
93 or a BAR read access could have function-specific side-effects.
100 It enumerates any devices on the
104 client drivers the chance to attach to them.
105 It assigns resources to children, when the BIOS does not.
106 It takes care of routing interrupts when necessary.
107 It reprobes the unattached
111 client drivers are dynamically
115 driver also includes support for PCI-PCI bridges,
116 various platform-specific Host-PCI bridges,
117 and basic support for
123 calls are supported by the
126 They are defined in the header file
128 .Bl -tag -width 012345678901234
135 It allows the user to retrieve information on all
137 devices in the system, or on
139 devices matching patterns supplied by the user.
142 to any value specified in either
148 structure consists of a number of fields:
149 .Bl -tag -width match_buf_len
151 The length, in bytes, of the buffer filled with user-supplied patterns.
153 The number of user-supplied patterns.
155 Pointer to a buffer filled with user-supplied patterns.
163 structure consists of the following elements:
164 .Bl -tag -width pd_vendor
167 domain, bus, slot and function.
173 device driver unit number.
184 The flags describe which of the fields the kernel should match against.
185 A device must match all specified fields in order to be returned.
186 The match flags are enumerated in the
187 .Va pci_getconf_flags
189 Hopefully the flag values are obvious enough that they do not need to
195 buffer allocated by the user to hold the results of the
199 Number of matches returned by the kernel.
201 Buffer containing matching devices returned by the kernel.
202 The items in this buffer are of type
204 which consists of the following items:
205 .Bl -tag -width pc_subvendor
208 domain, bus, slot and function.
232 device programming interface.
242 The offset is passed in by the user to tell the kernel where it should
243 start traversing the device list.
244 The value passed out by the kernel
245 points to the record immediately after the last one returned.
247 pass the value returned by the kernel in subsequent calls to the
250 If the user does not intend to use the offset, it must be set to zero.
253 configuration generation.
254 This value only needs to be set if the offset is set.
255 The kernel will compare the current generation number of its internal
256 device list to the generation passed in by the user to determine whether
257 its device list has changed since the user last called the
260 If the device list has changed, a status of
261 .Va PCI_GETCONF_LIST_CHANGED
264 The status tells the user the disposition of his request for a device list.
265 The possible status values are:
267 .It PCI_GETCONF_LAST_DEVICE
268 This means that there are no more devices in the PCI device list matching
269 the specified criteria after the
273 .It PCI_GETCONF_LIST_CHANGED
274 This status tells the user that the
276 device list has changed since his last call to the
278 ioctl and he must reset the
282 to zero to start over at the beginning of the list.
283 .It PCI_GETCONF_MORE_DEVS
284 This tells the user that his buffer was not large enough to hold all of the
285 remaining devices in the device list that match his criteria.
286 .It PCI_GETCONF_ERROR
287 This indicates a general error while servicing the user's request.
293 .Fn sizeof "struct pci_match_conf" ,
304 configuration registers specified by the passed-in
309 structure consists of the following fields:
310 .Bl -tag -width pi_width
314 structure which specifies the domain, bus, slot and function the user would
316 If the specific bus is not found, errno will be set to ENODEV and -1 returned
321 configuration registers the user would like to access.
323 The width, in bytes, of the data the user would like to read.
325 may be either 1, 2, or 4.
326 3-byte reads and reads larger than 4 bytes are
328 If an invalid width is passed, errno will be set to EINVAL.
330 The data returned by the kernel.
335 allows users to write to the
337 configuration registers specified in the passed-in
342 structure is described above.
343 The limitations on data width described for
344 reading registers, above, also apply to writing
346 configuration registers.
350 allows users to query if a driver is attached to the
352 device specified in the passed-in
357 structure is described above, however, the
362 The status of the device is stored in the
365 A value of 0 indicates no driver is attached, while a value larger than 0
366 indicates that a driver is attached.
370 command allows userspace processes to
372 the memory-mapped PCI BAR into its address space.
373 The input parameters and results are passed in the
375 structure, which has the following fields:
376 .Bl -tag -width Vt struct pcise pbm_sel
377 .It Vt uint64_t pbm_map_base
378 Reports the established mapping base to the caller.
380 .Va PCIIO_BAR_MMAP_FIXED
381 flag was specified, then this field must be filled before the call
382 with the desired address for the mapping.
383 .It Vt uint64_t pbm_map_length
384 Reports the mapped length of the BAR, in bytes.
385 Its .Vt uint64_t value is always multiple of machine pages.
386 .It Vt int64_t pbm_bar_length
387 Reports length of the bar as exposed by the device.
388 .It Vt int pbm_bar_off
389 Reports offset from the mapped base to the start of the
390 first register in the bar.
391 .It Vt struct pcisel pbm_sel
392 Should be filled before the call.
393 Describes the device to operate on.
395 The BAR index to mmap.
397 Flags which augments the operation.
399 .It Vt int pbm_memattr
400 The caching attribute for the mapping.
402 .Dv VM_MEMATTR_UNCACHEABLE
403 for control registers BARs, and
404 .Dv VM_MEMATTR_WRITE_COMBINING
406 Regular memory-like BAR should be mapped with
407 .Dv VM_MEMATTR_DEFAULT
411 Currently defined flags are:
412 .Bl -tag -width PCIIO_BAR_MMAP_ACTIVATE
413 .It PCIIO_BAR_MMAP_FIXED
414 The resulted mappings should be established at the address
417 member, otherwise fail.
418 .It PCIIO_BAR_MMAP_EXCL
419 Must be used together with
420 .Dv PCIIO_BAR_MMAP_FIXED
421 If the specified base contains already established mappings, the
422 operation fails instead of implicitly unmapping them.
423 .It PCIIO_BAR_MMAP_RW
424 The requested mapping allows both reading and writing.
425 Without the flag, read-only mapping is established.
426 Note that it is common for the device registers to have side-effects
428 .It PCIIO_BAR_MMAP_ACTIVATE
429 (Unimplemented) If the BAR is not activated, activate it in the course
431 Currently attempt to mmap an inactive BAR results in error.
436 command allows users to read from and write to BARs.
437 The I/O request parameters are passed in a
438 .Va struct pci_bar_ioreq
439 structure, which has the following fields:
441 .It Vt struct pcisel pbi_sel
442 Describes the device to operate on.
444 The operation to perform.
445 Currently supported values are
449 .It Vt uint32_t pbi_bar
450 The index of the BAR on which to operate.
451 .It Vt uint32_t pbi_offset
452 The offset into the BAR at which to operate.
453 .It Vt uint32_t pbi_width
454 The size, in bytes, of the I/O operation.
455 1-byte, 2-byte, 4-byte and 8-byte perations are supported.
456 .It Vt uint32_t pbi_value
457 For reads, the value is returned in this field.
458 For writes, the caller specifies the value to be written in this field.
460 Note that this operation maps and unmaps the corresponding resource and
461 so is relatively expensive for memory BARs.
465 can be used to create a persistent userspace mapping for such BARs instead.
469 Tunables can be set at the
471 prompt before booting the kernel, or stored in
473 The current value of these tunables can be examined at runtime via
475 nodes of the same name.
476 Unless otherwise specified,
477 each of these tunables is a boolean that can be enabled by setting the
478 tunable to a non-zero value.
479 .Bl -tag -width indent
480 .It Va hw.pci.clear_bars Pq Defaults to 0
481 Ignore any firmware-assigned memory and I/O port resources.
484 bus driver to allocate resource ranges for memory and I/O port resources
486 .It Va hw.pci.clear_buses Pq Defaults to 0
487 Ignore any firmware-assigned bus number registers in PCI-PCI bridges.
490 bus driver and PCI-PCI bridge driver to allocate bus numbers for secondary
491 buses behind PCI-PCI bridges.
492 .It Va hw.pci.clear_pcib Pq Defaults to 0
493 Ignore any firmware-assigned memory and I/O port resource windows in PCI-PCI
495 This forces the PCI-PCI bridge driver to allocate memory and I/O port resources
496 for resource windows from scratch.
498 By default the PCI-PCI bridge driver will allocate windows that
499 contain the firmware-assigned resources devices behind the bridge.
500 In addition, the PCI-PCI bridge driver will suballocate from existing window
501 regions when possible to satisfy a resource request.
504 .Va hw.pci.clear_bars
506 .Va hw.pci.clear_pcib
507 must be enabled to fully ignore firmware-supplied resource assignments.
508 .It Va hw.pci.default_vgapci_unit Pq Defaults to -1
512 VGA adapter encountered by the system is assumed to be the boot display device.
513 This tunable can be set to choose a specific VGA adapter by specifying the
514 unit number of the associated
517 .It Va hw.pci.do_power_nodriver Pq Defaults to 0
518 Place devices into a low power state
520 when a suitable device driver is not found.
521 Can be set to one of the following values:
522 .Bl -tag -width indent
526 devices without a device driver.
528 Powers down most devices without a device driver.
529 PCI devices with the display, memory, and base peripheral device classes
530 are not powered down.
532 Similar to a setting of 2 except that storage controllers are also not
535 All devices are left fully powered.
540 device must support power management to be powered down.
541 Placing a device into a low power state may not reduce power consumption.
542 .It Va hw.pci.do_power_resume Pq Defaults to 1
545 devices into the fully powered state when resuming either the system or an
547 Setting this to zero is discouraged as the system will not attempt to power
548 up non-powered PCI devices after a suspend.
549 .It Va hw.pci.do_power_suspend Pq Defaults to 1
552 devices into a low power state when suspending either the system or individual
554 Normally the D3 state is used as the low power state,
555 but firmware may override the desired power state during a system suspend.
556 .It Va hw.pci.enable_ari Pq Defaults to 1
557 Enable support for PCI-express Alternative RID Interpretation.
558 This is often used in conjunction with SR-IOV.
559 .It Va hw.pci.enable_io_modes Pq Defaults to 1
560 Enable memory or I/O port decoding in a PCI device's command register if it has
561 firmware-assigned memory or I/O port resources.
564 in some systems does not enable memory or I/O port decoding for some devices
565 even when it has assigned resources to the device.
566 This enables decoding for such resources during bus probe.
567 .It Va hw.pci.enable_msi Pq Defaults to 1
568 Enable support for Message Signalled Interrupts
570 MSI interrupts can be disabled by setting this tunable to 0.
571 .It Va hw.pci.enable_msix Pq Defaults to 1
572 Enable support for extended Message Signalled Interrupts
574 MSI-X interrupts can be disabled by setting this tunable to 0.
575 .It Va hw.pci.enable_pcie_ei Pq Defaults to 0
576 Enable support for PCI-express Electromechanical Interlock.
577 .It Va hw.pci.enable_pcie_hp Pq Defaults to 1
578 Enable support for native PCI-express HotPlug.
579 .It Va hw.pci.honor_msi_blacklist Pq Defaults to 1
580 MSI and MSI-X interrupts are disabled for certain chipsets known to have
581 broken MSI and MSI-X implementations when this tunable is set.
582 It can be set to zero to permit use of MSI and MSI-X interrupts if the
583 chipset match is a false positive.
584 .It Va hw.pci.iov_max_config Pq Defaults to 1MB
585 The maximum amount of memory permitted for the configuration parameters
586 used when creating Virtual Functions via SR-IOV.
587 This tunable can also be changed at runtime via
589 .It Va hw.pci.realloc_bars Pq Defaults to 0
590 Attempt to allocate a new resource range during the initial device scan
591 for any memory or I/O port resources with firmware-assigned ranges that
592 conflict with another active resource.
593 .It Va hw.pci.usb_early_takeover Pq Defaults to 1 on Tn amd64 and Tn i386
594 Disable legacy device emulation of USB devices during the initial device
596 Set this tunable to zero to use USB devices via legacy emulation when
597 using a custom kernel without USB controller drivers.
598 .It Va hw.pci<D>.<B>.<S>.INT<P>.irq
599 These tunables can be used to override the interrupt routing for legacy
601 Unlike other tunables in this list,
602 these do not have corresponding sysctl nodes.
603 The tunable name includes the address of the PCI device as well as the
604 pin of the desired INTx IRQ to override:
605 .Bl -tag -width indent
609 of the PCI device in decimal.
611 The bus address of the PCI device in decimal.
613 The slot of the PCI device in decimal.
615 The interrupt pin of the PCI slot to override.
624 The value of the tunable is the raw IRQ value to use for the INTx interrupt
625 pin identified by the tunable name.
626 Mapping of IRQ values to platform interrupt sources is machine dependent.
629 You can wire the device unit at a given location with device.hints.
631 .Va hints.<name>.<unit>.at="pci<B>:<S>:<F>"
633 .Va hints.<name>.<unit>.at="pci<D>:<B>:<S>:<F>"
634 will force the driver
636 to probe and attach at unit
638 for any PCI device found to match the specification, where:
639 .Bl -tag -width -indent
643 of the PCI device in decimal.
644 Defaults to 0 if unspecified
646 The bus address of the PCI device in decimal.
648 The slot of the PCI device in decimal.
650 The function of the PCI device in decimal.
653 The code to do the matching requires an exact string match.
654 Do not specify the angle brackets
657 Wiring multiple devices to the same
661 produces undefined results.
663 Given the following lines in
664 .Pa /boot/device.hints :
665 .Cd hint.nvme.3.at="pci6:0:0"
666 .Cd hint.igb.8.at="pci14:0:0"
667 If there is a device that supports
669 at PCI bus 14 slot 0 function 0,
670 then it will be assigned igb8 for probe and attach.
671 Likewise, if there is an
673 card at PCI bus 6 slot 0 function 0,
674 then it will be assigned nvme3 for probe and attach.
675 If another type of card is in either of these locations, the name and
676 unit of that card will be the default names and will be unaffected by
678 If other igb or nvme cards are located elsewhere, they will be
679 assigned their unit numbers sequentially, skipping the unit numbers
680 that have 'at' hints.
682 .Bl -tag -width /dev/pci -compact
684 Character device for the
693 driver (not the kernel's
695 support code) first appeared in
697 and was written by Stefan Esser and Garrett Wollman.
698 Support for device listing and matching was re-implemented by
699 Kenneth Merry, and first appeared in
702 .An Kenneth Merry Aq Mt ken@FreeBSD.org
704 It is not possible for users to specify an accurate offset into the device
705 list without calling the
707 at least once, since they have no way of knowing the current generation
709 This probably is not a serious problem, though, since
710 users can easily narrow their search by specifying a pattern or patterns
711 for the kernel to match against.