Sunday, May 20, 2012

How can I get `find` to ignore .svn directories?


I often use the find command to search through source code, delete files, whatever. Annoyingly, because Subversion stores duplicates of each file in its .svn/text-base/ directories my simple searches end up getting lots of duplicate results. For example, I want to recursively search for uint in multiple messages.h and messages.cpp files:




# find -name 'messages.*' -exec grep -Iw uint {} +
./messages.cpp: Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./messages.cpp: Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./messages.cpp: Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./messages.cpp: Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./messages.cpp: Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./messages.cpp: Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./messages.cpp: for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base: Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base: for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./virus/messages.cpp:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/messages.cpp:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/messages.h: void _progress(const std::string &fileName, uint scanCount);
./virus/messages.h: ProgressMessage(const std::string &fileName, uint scanCount);
./virus/messages.h: uint _scanCount;
./virus/.svn/text-base/messages.cpp.svn-base:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/.svn/text-base/messages.cpp.svn-base:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/.svn/text-base/messages.h.svn-base: void _progress(const std::string &fileName, uint scanCount);
./virus/.svn/text-base/messages.h.svn-base: ProgressMessage(const std::string &fileName, uint scanCount);
./virus/.svn/text-base/messages.h.svn-base: uint _scanCount;



How can I tell find to ignore the .svn directories?


Source: Tips4all

13 comments:

  1. For searching, can I suggest you look at ack ? It's a source-code aware find, and as such will automatically ignore many file types, including source code repository info such as the above.

    ReplyDelete
  2. As follows:

    find . -path '*/.svn*' -prune -o -print


    Or, alternatively based on a directory and not a path prefix:

    find . -name .svn -a -type d -prune -o -print

    ReplyDelete
  3. why not just

    find . -not -iwholename '*.svn*'


    The -not predicate negates everything that has .svn anywhere in the path.

    So in your case it would be

    find -not -iwholename '*.svn' -name 'messages.*' -exec grep -Iw uint {} + \;

    ReplyDelete
  4. Create a script called ~/bin/svnfind:

    #!/bin/bash
    #
    # Attempts to behave identically to a plain `find' command while ignoring .svn/
    # directories.

    OPTIONS=()
    PATHS=()
    EXPR=()

    while [[ $1 =~ ^-[HLP]+ ]]; do
    OPTIONS+=("$1")
    shift
    done

    while [[ $# -gt 0 ]] && ! [[ $1 =~ '^[-(),!]' ]]; do
    PATHS+=("$1")
    shift
    done

    # If user's expression contains no action then we'll add the normally-implied
    # `-print'.
    ACTION=-print

    while [[ $# -gt 0 ]]; do
    case "$1" in
    -delete|-exec|-execdir|-fls|-fprint|-fprint0|-fprintf|-ok|-print|-okdir|-print0|-printf|-prune|-quit|-ls)
    ACTION=;;
    esac

    EXPR+=("$1")
    shift
    done

    if [[ ${#EXPR} -eq 0 ]]; then
    EXPR=(-true)
    fi

    exec -a "$(basename "$0")" find "${OPTIONS[@]}" "${PATHS[@]}" -name .svn -type d -prune -o '(' "${EXPR[@]}" ')' $ACTION


    This script behaves identically to a plain find command but it prunes out .svn directories. Otherwise the behavior is identical.

    Example:

    # svnfind -name 'messages.*' -exec grep -Iw uint {} +
    ./messages.cpp: Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
    ./messages.cpp: Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
    ./messages.cpp: Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
    ./messages.cpp: Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
    ./messages.cpp: Log::verbose << "Sent message: id " << uint(preparedMessage->id)
    ./messages.cpp: Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
    ./messages.cpp: for (uint i = 0; i < 10 && !_stopThreads; ++i) {
    ./virus/messages.cpp:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
    ./virus/messages.cpp:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
    ./virus/messages.h: void _progress(const std::string &fileName, uint scanCount);
    ./virus/messages.h: ProgressMessage(const std::string &fileName, uint scanCount);
    ./virus/messages.h: uint _scanCount;

    ReplyDelete
  5. GNU find

    find . ! -regex ".*[/]\.svn[/]?.*"

    ReplyDelete
  6. I use grep for this purpose. Put this in your ~/.bashrc

    export GREP_OPTIONS="--binary-files=without-match --color=auto --devices=skip --exclude-dir=CVS --exclude-dir=.libs --exclude-dir=.deps --exclude-dir=.svn"


    grep automatically uses these options on invocation

    ReplyDelete
  7. Why dont you pipe your command with grep which is easily understandable:

    your find command| grep -v '\.svn'

    ReplyDelete
  8. wcfind is a find wrapper script that I use to automagically remove .svn directories.

    ReplyDelete
  9. Just thought I'd add a simple alternative to Kaleb's and others' posts (which detailed the use of the find -prune option, ack, repofind commands etc.) which is particularly applicable to the usage you have described in the question (and any other similar usages):


    For performance, you should always try to use find ... -exec grep ... + (thanks Kenji for pointing this out) or find ... | xargs egrep ... (portable) or find ... -print0 | xargs -0 egrep ... (GNU; works on filenames containing spaces) instead of find ... -exec grep ... \;.

    The find ... -exec ... + and find | xargs form does not fork egrep for each file, but rather for a bunch of files at a time, resulting in much faster execution.
    When using the find | xargs form you can also use grep to easily and quickly prune .svn (or any directories or regular expression), i.e. find ... -print0 | grep -v '/\.svn' | xargs -0 egrep ... (useful when you need something quick and can't be bothered to remember how to set up find's -prune logic.)

    The find | grep | xargs approach is similar to GNU find's -regex option (see ghostdog74's post), but is more portable (will also work on platforms where GNU find is not available.)

    ReplyDelete
  10. Try findrepo which is a simple wrapper around find/grep and much faster than ack
    You would use it in this case like:

    findrepo uint 'messages.*'

    ReplyDelete
  11. Here is what I would do in your case:

    find . -path .svn -prune -o -name messages.* -exec grep -Iw uint {} +




    Emacs' rgrep built-in command ignores .svn directory, and many more files you're probably not interested in when performing a find | grep. Here is what it uses by default:

    find . \( -path \*/SCCS -o -path \*/RCS -o -path \*/CVS -o -path \*/MCVS \
    -o -path \*/.svn -o -path \*/.git -o -path \*/.hg -o -path \*/.bzr \
    -o -path \*/_MTN -o -path \*/_darcs -o -path \*/\{arch\} \) \
    -prune -o \
    \( -name .\#\* -o -name \*.o -o -name \*\~ -o -name \*.bin -o -name \*.lbin \
    -o -name \*.so -o -name \*.a -o -name \*.ln -o -name \*.blg \
    -o -name \*.bbl -o -name \*.elc -o -name \*.lof -o -name \*.glo \
    -o -name \*.idx -o -name \*.lot -o -name \*.fmt -o -name \*.tfm \
    -o -name \*.class -o -name \*.fas -o -name \*.lib -o -name \*.mem \
    -o -name \*.x86f -o -name \*.sparcf -o -name \*.fasl -o -name \*.ufsl \
    -o -name \*.fsl -o -name \*.dxl -o -name \*.pfsl -o -name \*.dfsl \
    -o -name \*.p64fsl -o -name \*.d64fsl -o -name \*.dx64fsl -o -name \*.lo \
    -o -name \*.la -o -name \*.gmo -o -name \*.mo -o -name \*.toc \
    -o -name \*.aux -o -name \*.cp -o -name \*.fn -o -name \*.ky \
    -o -name \*.pg -o -name \*.tp -o -name \*.vr -o -name \*.cps \
    -o -name \*.fns -o -name \*.kys -o -name \*.pgs -o -name \*.tps \
    -o -name \*.vrs -o -name \*.pyc -o -name \*.pyo \) \
    -prune -o \
    -type f \( -name pattern \) -print0 \
    | xargs -0 -e grep -i -nH -e regex


    It ignores directories created by most version control systems, as well as generated files for many programming languages.
    You could create an alias that invokes this command and replace find and grep patterns for your specific problems.

    ReplyDelete
  12. This works for me in the Unix prompt


    gfind . \( -not -wholename '*\.svn*' \) -type f -name 'messages.*'
    -exec grep -Iw uint {} +


    The command above will list FILES that are not with .svn and do the grep you mentioned.

    ReplyDelete