Detecting which process is creating a file using LD_PRELOAD trick
The other day I was debugging an issue, basically I was trying to figure out which process is creating and writing to a file on Linux.
There are multiple ways to detect this (some are better and more efficient
than others). In this post I’m going to explain have I have done this using
LD_PRELOAD
trick.
This approach might not be the best and most efficient, but it’s a fun one and it can come handy in many other situations as well.
Alternative solutions
Before diving into how to use LD_PRELOAD
to solve this problem, lets have
a look at a couple of other (more efficient) solutions which could also work.
1. Using auditd
auditd is a subsystem for monitoring and accounting for Linux. Other *nix and BSD based systems also include a similar subsystems (e.g. standard installation of FreeBSD includes audit and so on).
Among other things, auditd allows you to monitor file accesses and writes and that’s exactly what we are looking for.
To use it, you first create a watch file rule using auditctl
:
I’ve used w
flag because we want to monitor file writes and creates.
And then you can use aureport
to view audit reports:
2. Using strace, dtrace or a similar tool for tracking syscalls
Another approach is using a tool like strace or dtrace which allows you to monitor all the system calls used by a running process.
Those tools are very powerful (especially dtrace which offers a very flexible
scripting language), but the problem with dtrace
is that it’s not available
on all the Linux distributions yet and strace
needs to be attached to an
existing running process.
In my case I was trying to find the offending process so this approach doesn’t really work here.
3. Using inotify
Inotify is a kernel subsystem which allows you to monitor and subscribe to file system changes.
The are two problems with this approach which don’t make it ideal for solving this problem:
- inotify is used through system calls which means you need to use some other higher-level tool which uses inotify underneath or write some code yourself.
- inotify only tells you that a file has been modified, but it doesn’t tell you who modified it.
First problem can be solved pretty easily. You can use an existing tool such as inotifywait or write a couple of lines of code yourself. The good thing is that you can find inotify bindings for most of the popular higher level languages (e.g. there is pyinotify for Python) and some frameworks like Node.js already provide support for it in the standard library (see fs.watch).
4. Using fuser
My friend Lakshmi asked if the fuser
command didn’t help me. The answer
is no.
fuser is a useful command line tool which lists all the processes which
are currently using a file or a directory (underneath is just uses procfs
).
The problem is that it doesn’t work well for my use case. It only lists processes which are currently using an existing file. There are two problems with that:
- It doesn’t support polling and it only works on an existing file.
- It only lists processes which are currently using a file - this means processes which are currently holding a file or socket open.
Both of those problems can be worked around by writing a simple loop which
calls fuser
indefinitely, but the problem with that is that it will most
likely miss processes which only open a file for a short amount of time
(e.g. fast open, write, close sequence).
On top of that, this polling approach is very inefficient and there are better and way more efficient approaches for this, like the aforementioned inotify subsystem.
Similar arguments also apply to the lsof
approach.
LD_PRELOAD approach
OK, now back to the LD_PRELOAD
approach I have decided to use.
LD_PRELOAD
allows you to specify a list of of ELF shared libraries to load
before other libraries, including libc.
To use it, you simply set LD_PRELOAD
environment variable to point to your
shared library or libraries.
For example:
This approach is very powerful and you can, among many other things, use it to mask functions provided by libc and other libraries. This comes very handy in many cases, including this one.
As other approaches described above, this one also has some limitations:
- LD_PRELOAD approach doesn’t work with binaries which have suid permissions bit set (see setuid and setgid for more info)
- If you use SELinux, it will, by default, automatically set
AT_SECURE
glibc flag on a domain transition (e.g. when you use fork / execve) which means child processes won’t inherit environment variables from the parent process.
I’ve used this approach to solve my problem by masking / wrapping fopen
and open
function provided by libc.
Wrapped functions behave almost the same as the original ones, the only difference is that they log file access information to a file before calling the original function.
In my case, I’ve logged the timestamp and the pid and name of the process which has called the function.
Some code which shows how you can do that is shown bellow.
log_file_access.c
Makefile:
Using it:
To make it work for all the started processes, I have modified upstart scripts
and /etc/profile
file to set the LD_PRELOAD
environment variable.
Other use cases for LD_PRELOAD
As noted above, LD_PRELOAD
can come handy in many different scenarios. One
of the cases worth mentioning is mocking library functions for tests.
A while back when I was still at Rackspace we were discussing how to mock and test MySQL check used by our monitoring agent. MySQL check fetches a bunch of metrics from a MySQL server using a MySQL client library (libmysql).
One of the approaches I have suggested was to use LD_PRELOAD
to wrap
functions from the MySQL client library and make them return mock data. I
thought this was pretty clever, but soon afterwards, Paul came up with
even more clever approach using ffi.
It turned out that ffi approach was even simpler and better, but this was mostly because unlike most other languages, Lua includes a really nice ffi library in the core. Okay cffi for Python is not bad either, but that’s mostly because it’s modeled after the Lua one :-)
If you are not or can’t use ffi, using LD_PRELOAD
is a good and valid
alternative.
Remember that this is just the tip of the iceberg. Other cool use cases
include wrapping a ptrace
function to prevent debugger detection and
anti-debugging techniques in the application and so on.
Edit 1 (January 11th, 2014) - Added a section about fuser
.