やるきなし

2016/02/20 22:59 / overlayfs over btrfs (rm -r; Invalid argument)

As follows, I've tested directory removal in overlayfs over several filesystems, such as ext4, btrfs, and xfs. It seems that it is better not to use btrfs as an upper (and work; *1) directory of overlayfs.

*1: overlayfs: workdir and upperdir must reside under the same mount

% cat <<EOF > /tmp/test.sh
cd /tmp/test/
rm -rf *
mkdir -p lower/dir
mkdir -p upper/
mkdir -p work/
mkdir -p overlay/
touch lower/dir/a
touch lower/dir/b
touch lower/dir/c
EOF

% cat <<EOF > /tmp/test2.sh
rm -f /tmp/test.img
fallocate -l 100M /tmp/test.img
if ! \$1 /tmp/test.img > /dev/null 2>&1 ; then
  echo failed
  exit
fi
mkdir -p /tmp/test
sudo mount /tmp/test.img /tmp/test
sudo chmod 777 /tmp/test
sh /tmp/test.sh
sudo mount overlayfs -t overlay -o lowerdir=/tmp/test/lower,upperdir=/tmp/test/upper,workdir=/tmp/test/work  /tmp/test/overlay
rm -r /tmp/test/overlay/dir
ls -Rla /tmp/test/upper
sudo umount /tmp/test/overlay
sudo umount /tmp/test
EOF
% sh /tmp/test2.sh /sbin/mkfs.ext4
/tmp/test/upper:
total 2
drwxr-xr-x 2 myn     users 1024 Feb 20 23:25 .
drwxrwxrwx 6 root    root  1024 Feb 20 23:25 ..
c--------- 1 myn     users 0, 0 Feb 20 23:25 dir
% sh /tmp/test2.sh /sbin/mkfs.xfs
/tmp/test/upper:
total 0
drwxr-xr-x 2 myn     users   17 Feb 20 23:25 .
drwxrwxrwx 6 root    root    59 Feb 20 23:25 ..
c--------- 1 myn     users 0, 0 Feb 20 23:25 dir
% sh /tmp/test2.sh /bin/mkfs.btrfs
rm: cannot remove '/tmp/test/overlay/dir': Invalid argument
/tmp/test/upper:
total 16
drwxr-xr-x 1 myn     users  6 Feb 20 23:25 .
drwxrwxrwx 1 root    root  42 Feb 20 23:25 ..
drwxr-xr-x 1 myn     users  6 Feb 20 23:25 dir

/tmp/test/upper/dir:
total 0
drwxr-xr-x 1 myn     users    6 Feb 20 23:25 .
drwxr-xr-x 1 myn     users    6 Feb 20 23:25 ..
c--------- 1 myn     users 0, 0 Feb 20 23:25 a
c--------- 1 myn     users 0, 0 Feb 20 23:25 b
c--------- 1 myn     users 0, 0 Feb 20 23:25 c

In the cases of ext4 and xfs, the whiteout directory can be successfully generated. But btrfs fails.

I traced the overlayfs's behavior using debugfs (Documentation/dynamic-debug-howto.txt) as follows.

% sudo sh -c 'echo -n "module overlay +p" > /sys/kernel/debug/dynamic_debug/control'
% sh /tmp/test2.sh /sbin/mkfs.ext4 > /dev/null
% dmesg
[208788.770416] mkdir(work/work, 040000) = 0
[208788.773640] mkdir(work/#ffff8800bb3c5618, 040000) = 0
[208788.773648] rename2(work/#ffff8800bb3c5618, upper/dir, 0x0)
[208788.773664] whiteout(work/#ffff88007579a9d8) = 0
[208788.773666] rename2(work/#ffff88007579a9d8, dir/b, 0x0)
[208788.773693] whiteout(work/#ffff8800bb3b3798) = 0
[208788.773695] rename2(work/#ffff8800bb3b3798, dir/a, 0x0)
[208788.773712] whiteout(work/#ffff880094ef2cd8) = 0
[208788.773713] rename2(work/#ffff880094ef2cd8, dir/c, 0x0)
[208788.773753] mkdir(work/#ffff8800bb3c5618, 040755) = 0
[208788.773765] setxattr(work/#ffff8800bb3c5618, "trusted.overlay.opaque", "y", 0x0) = 0
[208788.773770] rename2(work/#ffff8800bb3c5618, upper/dir, 0x2)
[208788.773780] unlink(#ffff8800bb3c5618/b) = 0
[208788.773789] unlink(#ffff8800bb3c5618/a) = 0
[208788.773794] unlink(#ffff8800bb3c5618/c) = 0
[208788.773821] rmdir(work/#ffff8800bb3c5618) = 0
[208788.773828] whiteout(work/#ffff8800bb3c5618) = 0
[208788.773829] rename2(work/#ffff8800bb3c5618, upper/dir, 0x2)
[208788.773837] rmdir(work/#ffff8800bb3c5618) = 0
% sh /tmp/test2.sh /bin/mkfs.btrfs > /dev/null 2>&1
% dmesg
[208822.019517] mkdir(work/work, 040000) = 0
[208822.022430] mkdir(work/#ffff88009d74b858, 040000) = 0
[208822.022439] rename2(work/#ffff88009d74b858, upper/dir, 0x0)
[208822.022520] whiteout(work/#ffff88009d68e558) = 0
[208822.022523] rename2(work/#ffff88009d68e558, dir/a, 0x0)
[208822.022564] whiteout(work/#ffff8800aea6e9d8) = 0
[208822.022565] rename2(work/#ffff8800aea6e9d8, dir/b, 0x0)
[208822.022592] whiteout(work/#ffff8800877e83d8) = 0
[208822.022593] rename2(work/#ffff8800877e83d8, dir/c, 0x0)
[208822.022634] mkdir(work/#ffff88009d74b858, 040755) = 0
[208822.022640] setxattr(work/#ffff88009d74b858, "trusted.overlay.opaque", "y", 0x0) = 0
[208822.022643] rename2(work/#ffff88009d74b858, upper/dir, 0x2)
[208822.022657] ...rename2(work/#ffff88009d74b858, upper/dir, ...) = -22
[208822.022669] rmdir(work/#ffff88009d74b858) = 0

rename2's 0x2 means RENAME_EXCHANGE (1 << 1) (see include/uapi/linux/fs.h) and its return value -22 means EINVAL (see include/uapi/asm-generic/errno-base.h). According to Documentation/filesystems/vfs.txt, it seems that btrfs does not support RENAME_EXCHANGE.

  rename2: this has an additional flags argument compared to rename.
        If no flags are supported by the filesystem then this method
        need not be implemented.  If some flags are supported then the
        filesystem must return -EINVAL for any unsupported or unknown
        flags.  Currently the following flags are implemented:
        (1) RENAME_NOREPLACE: this flag indicates that if the target
        of the rename exists the rename should fail with -EEXIST
        instead of replacing the target.  The VFS already checks for
        existence, so for local filesystems the RENAME_NOREPLACE
        implementation is equivalent to plain rename.
        (2) RENAME_EXCHANGE: exchange source and target.  Both must
        exist; this is checked by the VFS.  Unlike plain rename,
        source and target may be of different type.

Note that I tested the above on linux-image-4.3.0-1-amd64 4.3.5-1 Debian package and my Linux 4.4.2 build.

According to Btrfs Wiki, this issue is not claimed yet.

Implement new RENAME_* modes

Not claimed — no patches yet — Not in kernel yet

There are new modes of rename syscall.

  * RENAME_EXCHANGE
  * RENAME_WHITEOUT

P.S.(2016/2/24)

This issue seems to be already reported to LKML and CentOS/RedHat BTS as follows.

P.S.(2016/4/1)

It seems that a patch for RENAME_EXCHANGE implementation is available as follows. It might be merged into Linux 4.7 according to http://www.spinics.net/lists/linux-btrfs/msg53206.html.