Sunday, May 27, 2012

Is it safe to parse a /proc/ file?


Well, this is going to be a short one...



I want to parse /proc/net/tcp/ , but is it safe? I mean, how to open and read it and not be afraid, that some other process (or the OS) will be changing it in the same time?



Source: Tips4all

7 comments:

  1. Although the files in /proc appear as regular files in userspace, they are not really files but rather entities that support the standard file operations from userspace (open, read, close). Note that this is quite different than having an ordinary file on disk that is being changed by the kernel.

    All the kernel does is print its internal state into its own memory using a sprintf-like function, and that memory is copied into userspace whenever you issue a read(2) system call.

    The kernel handles these calls in an entirely different way than for regular files, which could mean that the entire snapshot of the data you will read could be ready at the time you open(2) it, while the kernel makes sure that concurrent calls are consistent and atomic. I haven't read that anywhere, but it doesn't really make sense to be otherwise.

    My advice is to take a look at the implementation of a proc file in your particular Unix flavour. This is really an implementation issue (as is the format and the contents of the output) that is not governed by a standard.

    The simplest example would be the implementation of the uptime proc file in Linux:
    http://lxr.free-electrons.com/source/fs/proc/uptime.c. Note how the entire buffer is produced in the callback function supplied to single_open.

    ReplyDelete
  2. In general, no. (So most of the answers here are wrong.) It might be safe, depending on what property you want. But it's easy to end up with bugs in your code if you assume too much about the consistency of a file in /proc. For example, see this bug which came from assuming that /proc/mounts was a consistent snapshot.

    For example:


    /proc/uptime is totally atomic, as someone mentioned in another answer -- but only since Linux 2.6.30, which is less than two years old. So even this tiny, trivial file was subject to a race condition until then, and still is in most enterprise kernels. See fs/proc/uptime.c for the current source, or the commit that made it atomic. On a pre-2.6.30 kernel, you can open the file, read a bit of it, then if you later come back and read again, the piece you get will be inconsistent with the first piece. (I just demonstrated this -- try it yourself for fun.)
    /proc/mounts is atomic within a single read system call. So if you read the whole file all at once, you get a single consistent snapshot of the mount points on the system. However, if you use several read system calls -- and if the file is big, this is exactly what will happen if you use normal I/O libraries and don't pay special attention to this issue -- you will be subject to a race condition. Not only will you not get a consistent snapshot, but mount points which were present before you started and never stopped being present might go missing in what you see. To see that it's atomic for one read(), look at m_start() in fs/namespace.c and see it grab a semaphore that guards the list of mountpoints, which it keeps until m_stop(), which is called when the read() is done. To see what can go wrong, see this bug from last year (same one I linked above) in otherwise high-quality software that blithely read /proc/mounts.
    /proc/net/tcp, which is the one you're actually asking about, is even less consistent than that. It's atomic only within each row of the table. To see this, look at listening_get_next() in net/ipv4/tcp_ipv4.c and established_get_next() just below in the same file, and see the locks they take out on each entry in turn. I don't have repro code handy to demonstrate the lack of consistency from row to row, but there are no locks there (or anything else) that would make it consistent. Which makes sense if you think about it -- networking is often a super-busy part of the system, so it's not worth the overhead to present a consistent view in this diagnostic tool.


    The other piece that keeps /proc/net/tcp atomic within each row is the buffering in seq_read(), which you can read in fs/seq_file.c. This ensures that once you read() part of one row, the text of the whole row is kept in a buffer so that the next read() will get the rest of that row before starting a new one. The same mechanism is used in /proc/mounts to keep each row atomic even if you do multiple read() calls, and it's also the mechanism that /proc/uptime in newer kernels uses to stay atomic. That mechanism does not buffer the whole file, because the kernel is cautious about memory use.

    Most files in /proc will be at least as consistent as /proc/net/tcp, with each row a consistent picture of one entry in whatever information they're providing, because most of them use the same seq_file abstraction. As the /proc/uptime example illustrates, though, some files were still being migrated to use seq_file as recently as 2009; I bet there are still some that use older mechanisms and don't have even that level of atomicity. These caveats are rarely documented. For a given file, your only guarantee is to read the source.

    In the case of /proc/net/tcp, you can read it and parse each line without fear. But if you try to draw any conclusions from multiple lines at once -- beware, other processes and the kernel are changing it while you read it, and you are probably creating a bug.

    ReplyDelete
  3. /proc is a virtual file system : in fact, it just gives a convenient view of the kernel internals. It's definitely safe to read it (that's why it's here) but it's risky on the long term, as the internal of these virtual files may evolve with newer version of kernel.

    EDIT

    More information available in proc documentation in Linux kernel doc, chapter 1.4 Networking
    I can't find if the information how the information evolve over time. I thought it was frozen on open, but can't have a definite answer.

    EDIT2

    According to Sco doc (not linux, but I'm pretty sure all flavours of *nix behave like that)


    Although process state and
    consequently the contents of /proc
    files can change from instant to
    instant, a single read(2) of a /proc
    file is guaranteed to return a
    ``sane'' representation of state, that
    is, the read will be an atomic
    snapshot of the state of the process.
    No such guarantee applies to
    successive reads applied to a /proc
    file for a running process. In
    addition, atomicity is specifically
    not guaranteed for any I/O applied to
    the as (address-space) file; the
    contents of any process's address
    space might be concurrently modified
    by an LWP of that process or any other
    process in the system.

    ReplyDelete
  4. The procfs API in the Linux kernel provides an interface to make sure that reads return consistent data. Read the comments in __proc_file_read. Item 1) in the big comment block explains this interface.

    That being said, it is of course up to the implementation of a specific proc file to use this interface correctly to make sure its returned data is consistent. So, to answer your question: no, the kernel does not guarantee consistency of the proc files during a read but it provides the means for the implementations of those files to provide consistency.

    ReplyDelete
  5. I have the source for Linux 2.6.27.8 handy since I'm doing driver development at the moment on an embedded ARM target.

    The file ...linux-2.6.27.8-lpc32xx/net/ipv4/raw.c at line 934 contains, for example

    seq_printf(seq, "%4d: %08X:%04X %08X:%04X"
    " %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
    i, src, srcp, dest, destp, sp->sk_state,
    atomic_read(&sp->sk_wmem_alloc),
    atomic_read(&sp->sk_rmem_alloc),
    0, 0L, 0, sock_i_uid(sp), 0, sock_i_ino(sp),
    atomic_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops));


    which outputs

    [wally@zenetfedora ~]$ cat /proc/net/tcp
    sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
    0: 017AA8C0:0035 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 15160 1 f552de00 299
    1: 00000000:C775 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 13237 1 f552ca00 299
    ...


    in function raw_sock_seq_show() which is part of a hierarchy of procfs handling functions. The text is not generated until a read() request is made of the /proc/net/tcp file, a reasonable mechanism since procfs reads are surely much less common than updating the information.

    Some drivers (such as mine) implement the proc_read function with a single sprintf(). The extra complication in the core drivers implementation is to handle potentially very long output which may not fit in the intermediate, kernel-space buffer during a single read.

    I tested that with a program using a 64K read buffer but it results in a kernel space buffer of 3072 bytes in my system for proc_read to return data. Multiple calls with advancing pointers are needed to get more than that much text returned. I don't know what the right way to make the returned data consistent when more than one i/o is needed. Certainly each entry in /proc/net/tcp is self-consistent. There is some likelihood that lines side-by-side are snapshot at different times.

    ReplyDelete
  6. Short of unknown bugs, there are no race conditions in /proc that would lead to reading corrupted data or a mix of old and new data. In this sense, it's safe. However there's still the race condition that much of the data you read from /proc is potentially-outdated as soon as it's generated, and even moreso by the time you get to reading/processing it. For instance processes can die at any time and a new process can be assigned the same pid; the only process ids you can ever use without race conditions are your own child processes'. Same goes for network information (open ports, etc.) and really most of the information in /proc. I would consider it bad and dangerous practice to rely on any data in /proc being accurate, except data about your own process and potentially its child processes. Of course it may still be useful to present other information from /proc to the user/admin for informative/logging/etc. purposes.

    ReplyDelete
  7. When you read from a /proc file, the kernel is calling a function which has been registered in advance to be the "read" function for that proc file. See the __proc_file_read function in fs/proc/generic.c .

    Therefore, the safety of the proc read is only as safe as the function the kernel calls to satisfy the read request. If that function properly locks all data it touches and returns to you in a buffer, then it is completely safe to read using that function. Since proc files like the one used for satisfying read requests to /proc/net/tcp have been around for a while and have undergone scrupulous review, they are about as safe as you could ask for. In fact, many common Linux utilities rely on reading from the proc filesystem and formatting the output in a different way. (Off the top of my head, I think 'ps' and 'netstat' do this).

    As always, you don't have to take my word for it; you can look at the source to calm your fears. The following documentation from proc_net_tcp.txt tells you where the "read" functions for /proc/net/tcp live, so you can look at the actual code that is run when you read from that proc file and verify for yourself that there are no locking hazards.


    This document describes the interfaces
    /proc/net/tcp and /proc/net/tcp6.
    Note that these interfaces are
    deprecated in favor of tcp_diag.
    These /proc interfaces provide information about currently active TCP
    connections, and are implemented by
    tcp4_seq_show() in net/ipv4/tcp_ipv4.c
    and tcp6_seq_show() in
    net/ipv6/tcp_ipv6.c, respectively.

    ReplyDelete