These pages are hopelessly updated! Go to Latest sources can be only get from CVS at souceforge (for now, anyway)

Network Block Device

Network Block Device (TCP version)

What is it: With this thing compiled into your kernel, Linux can use a remote server as one of its block devices. Every time the client computer wants to read /dev/nd0, it will send a request to the server via TCP, which will reply with the data requested. This can be used for stations with low disk space (or even diskless - if you boot from floppy) to borrow disk space from other computers. Unlike NFS, it is possible to put any file system on it. But (also unlike NFS), if someone has mounted NBD read/write, you must assure that no one else will have it mounted.

Limitations:It is impossible to use NBD as root file system, as an user-land program is required to start (but you could get away with initrd; I never tried that). (Patches to change this are welcome.) It also allows you to run read-only block-device in user-land (making server and client physically the same computer, communicating using loopback). Please notice that read-write nbd with client and server on the same machine is bad idea: expect deadlock within seconds (this may vary between kernel versions, maybe on one sunny day it will be even safe?). More generally, it is bad idea to create loop in 'rw mounts graph'. I.e., if machineA is using device from machineB readwrite, it is bad idea to use device on machineB from machineA.

Read-write nbd with client and server on some machine has rather fundamental problem: when system is short of memory, it tries to write back dirty page. So nbd client asks nbd server to write back data, but as nbd-server is userland process, it may require memory to fullfill the request. That way lies the deadlock.

Current state: It currently works. Network block device seems to be pretty stable. I originaly thought that it is impossible to swap over TCP. It turned out not to be true - swapping over TCP now works and seems to be deadlock-free.

If you want swapping to work, first make nbd working. (You'll have to mkswap on server; mkswap tries to fsync which will fail.) Now, you have version which mostly works. Ask me for kreclaimd if you see deadlocks.

Network block device has been included into standard (Linus') kernel tree in 2.1.101.

I've successfully ran raid5 and md over nbd. (Pretty recent version is required to do so, however.)

Devices: Network block device uses major 43, minors 0..n (where n is configurable in nbd.h). Create these files by mknod when needed. After that, your ls -l /dev/ should look like:

brw-rw-rw-   1 root     root      43,   0 Apr 11 00:28 nd0
brw-rw-rw-   1 root     root      43,   1 Apr 11 00:28 nd1

These commands should do the job:

mknod /dev/nd0 b 43 0
mknod /dev/nd1 b 43 1
mknod /dev/nd2 b 43 2
mknod /dev/nd3 b 43 3

Disclaimer: If you try to export device with already existing data, be prepared to loose them. This beast already killed one partition (not mine :-). Make sure you test your setup at least a little bit before putting it into production. Don't do your tests on important filesystems. It might even work.

Client/server documentation

Few client/server versions is included here. Look into c-files for more documentation. Note that not all client/server versions are compatible with all other client/server versions and kernel versions. Good luck.

Example of usage; do this on one console: (of course you could use raw partition instead of /tmp/delme file)

root@bug:/tmp# cat /dev/zero > /tmp/delme

root@bug:/tmp# ls -l /tmp/delme
-rw-r--r--   1 root     root     15552512 Mar 16 22:40 /tmp/delme
root@bug:/tmp# cd ~pavel/WWW/nbd
root@bug:/home/pavel/WWW/nbd# ./nbd-server 1024 /tmp/delme

And this on the second one:

root@bug:/home/pavel/WWW/nbd# ./nbd-client other_machine 1024 /dev/nd0 [note note note: DON'T TRY THIS TO LOCALHOST!]
Negotiation: ..size = 15552512
root@bug:/home/pavel/WWW/nbd# mke2fs /dev/nd0
mke2fs 1.10, 24-Apr-97 for EXT2 FS 0.5b, 95/08/09
Linux ext2 filesystem format
Filesystem label=
3808 inodes, 15188 blocks
759 blocks (5.00%) reserved for the super user
First data block=1
Block size=1024 (log=0)
Fragment size=1024 (log=0)
2 block groups
8192 blocks per group, 8192 fragments per group
1904 inodes per group
Superblock backups stored on blocks:

Writing inode tables: done
Writing superblocks and filesystem accounting information: done
root@bug:/home/pavel/WWW/nbd# mount /dev/nd0 /mnt
root@bug:/home/pavel/WWW/nbd# cd /mnt
root@bug:/mnt# ls -al
total 14
drwxr-xr-x   3 root     root         1024 Mar 16 22:41 ./
drwxr-xr-x  25 root     root         1024 Feb 15 15:45 ../
drwxr-xr-x   2 root     root        12288 Mar 16 22:41 lost+found/
root@bug:/mnt# mkdir x
root@bug:/mnt# cd /
root@bug:/# umount /mnt
Kernel call returned.Closing: que, sock, done
root@bug:/# e2fsck -f /tmp/delme
e2fsck 1.10, 24-Apr-97 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/tmp/delme: 12/3808 files (0.0% non-contiguous), 499/15188 blocks

Thanx go to:

New versions

Historic versions.

Old versions (you need these versions on 2.1.115 and less; you do not want to do this. This is history.)

Protocol: This is true for 'old' protocol, i.e. that one in 2.1.101 Linus' tree. Look at nbd.h what you are actually using. An user-land program passes a file handle with a connected TCP socket to the kernel driver. This way, kernel does not have to care about connecting etc. The protocol used is rather simple: If driver is asked to read from/write to the block device, it sends packet of following form "request" (all data are in network byte order):

  __u32 magic;        must be equal to NBD_REQUEST_MAGIC (see nbd.h)
  __u32 from;	      position in bytes to read from / write to
  __u32 len;	      number of bytes to be read / written
  __u64 handle;	      handle of operation
  __u32 type;	      0 = read
		      1 = write
  ...		      in case of a write operation, this is 
		      immediately followed by len bytes of data

Upon completion of the operation, the server responds with a packet of following structure "reply":

  __u32 magic;        must be equal to NBD_REPLY_MAGIC (see nbd.h)
  __u64 handle;	      handle copyied from request
  __u32 error;	      0 = operation completed successfully,
		      else error code
  ...		      in case of read operation with no error, 
                      this is immediately followed len bytes of data

This page was created by Pavel Machek.