1 .\" Copyright (c) 2003, David G. Lawrence
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice unmodified, this list of conditions, and the following
10 .\" 2. Redistributions in binary form must reproduce the above copyright
11 .\" notice, this list of conditions and the following disclaimer in the
12 .\" documentation and/or other materials provided with the distribution.
14 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
15 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
17 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
18 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
20 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
21 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
22 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
23 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
33 .Nd send a file to a socket
42 .Fa "int fd" "int s" "off_t offset" "size_t nbytes"
43 .Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags"
49 sends a regular file or shared memory object specified by descriptor
51 out a stream socket specified by descriptor
56 argument specifies where to begin in the file.
59 fall beyond the end of file, the system will return
60 success and report 0 bytes sent as described below.
63 argument specifies how many bytes of the file should be sent, with 0 having the special
64 meaning of send until the end of file has been reached.
66 An optional header and/or trailer can be sent before and after the file data by specifying
68 .Vt "struct sf_hdtr" ,
69 which has the following structure:
71 .Bd -literal -offset indent -compact
73 struct iovec *headers; /* pointer to header iovecs */
74 int hdr_cnt; /* number of header iovecs */
75 struct iovec *trailers; /* pointer to trailer iovecs */
76 int trl_cnt; /* number of trailer iovecs */
91 system call for information on the iovec structure.
92 The number of iovecs in these
93 arrays is specified by
100 the system will write the total number of bytes sent on the socket to the
101 variable pointed to by
104 The least significant 16 bits of
106 argument is a bitmap of these values:
107 .Bl -tag -offset indent -width "SF_USER_READAHEAD"
113 instead of blocking when a busy page is encountered.
114 This rare situation can happen if some other process is now working
115 with the same region of the file.
116 It is advised to retry the operation after a short period.
122 had slightly different notion.
125 to run I/O operations in case if an invalid (not cached) page is encountered,
126 thus avoiding blocking on I/O.
130 sending files off the
132 filesystem does not block on I/O
134 .Sx IMPLEMENTATION NOTES
135 ), so the condition no longer applies.
136 However, it is safe if an application utilizes
140 performs the same action as it did in
148 in a different context.
150 The data sent to socket will not be cached by the virtual memory system,
151 and will be freed directly to the pool of free pages.
154 sleeps until the network stack no longer references the VM pages
155 of the file, making subsequent modifications to it safe.
156 Please note that this is not a guarantee that the data has actually
158 .It Dv SF_USER_READAHEAD
160 has some internal heuristics to do readahead when sending data.
163 to override any heuristically calculated readahead and use exactly the
164 application specified readahead.
166 .Sx SETTING READAHEAD
167 for more details on readahead.
170 When using a socket marked for non-blocking I/O,
172 may send fewer bytes than requested.
173 In this case, the number of bytes successfully
174 written is returned in
180 .Sh SETTING READAHEAD
182 uses internal heuristics based on request size and file system layout
184 Additionally application may request extra readahead.
185 The most significant 16 bits of
187 specify amount of pages that
189 may read ahead when reading the file.
192 is provided to combine readahead amount and flags.
193 An example showing specifying readahead of 16 pages and
197 .Bd -literal -offset indent -compact
198 SF_FLAGS(16, SF_NOCACHE)
202 will use either application specified readahead or internally calculated,
205 .Dv SF_USER_READAHEAD
206 would turn off any heuristics and set maximum possible readahead length to
207 the number of pages specified via flags.
208 .Sh IMPLEMENTATION NOTES
213 does not block on disk I/O when it sends a file off the
216 The syscall returns success before the actual I/O completes, and data
217 is put into the socket later unattended.
218 However, the order of data in the socket is preserved, so it is safe
219 to do further writes to the socket.
225 is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided.
227 .Ss physical paging buffers
229 uses vnode pager to read file pages into memory.
230 The pager uses a pool of physical buffers to run its I/O operations.
231 When system runs out of pbufs, sendfile will block and report state
233 Size of the pool can be tuned with
236 tunable and can be checked with
238 OID of the same name at runtime.
239 .Ss sendfile(2) buffers
240 On some architectures, this system call internally uses a special
243 .Pq Vt "struct sf_buf"
244 to handle sending file data to the client.
245 If the sending socket is
246 blocking, and there are not enough
250 will block and report a state of
252 If the sending socket is non-blocking and there are not enough
254 buffers available, the call will block and wait for the
255 necessary buffers to become available before finishing the call.
259 allocated should be proportional to the number of nmbclusters used to
260 send data to a client via
262 Tune accordingly to avoid blocking!
263 Busy installations that make extensive use of
265 may want to increase these values to be inline with their
266 .Va kern.ipc.nmbclusters
273 buffers available is determined at boot time by either the
278 kernel configuration tunable.
284 .Va kern.ipc.nsfbufsused
286 .Va kern.ipc.nsfbufspeak
289 variables show current and peak
291 buffers usage respectively.
292 These values may also be viewed through
299 doesn't exist, your architecture does not need to use
301 buffers because their task can be efficiently performed
302 by the generic virtual memory structures.
308 The socket is marked for non-blocking I/O and not all data was sent due to
309 the socket buffer being filled.
310 If specified, the number of bytes successfully sent will be returned in
316 is not a valid file descriptor.
321 is not a valid socket descriptor.
323 A busy page was encountered and
326 Partial data may have been sent.
328 An invalid address was specified for an argument.
332 before it could be completed.
333 If specified, the number
334 of bytes successfully sent will be returned in
340 is not a regular file.
345 is not a SOCK_STREAM type socket.
352 An error occurred while reading from
354 .It Bq Er ENOTCAPABLE
359 argument has insufficient rights.
361 The system was unable to allocate an internal buffer.
366 points to an unconnected socket.
373 The file system for descriptor
378 The socket peer has closed the connection.
394 .%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management
395 .%J The Proceedings of the 2005 USENIX Annual Technical Conference
405 This manual page first appeared in
409 support for sending shared memory descriptors had been introduced.
412 a non-blocking implementation had been introduced.
414 The initial implementation of
417 and this manual page were written by
418 .An David G. Lawrence Aq Mt dg@dglawrence.com .
421 implementation was written by
422 .An Gleb Smirnoff Aq Mt glebius@FreeBSD.org .
426 system call will not fail, i.e., return
432 if provided an invalid address for