API

formic Package

An implementation of Apache Ant globs.

formic Module

An implementation of Ant Globs.

The main entry points for this modules are:

  • FileSet: A collection of include and exclude globs starting at a specific directory.
  • Pattern: An individual glob
class formic.formic.ConstantMatcher(pattern)

Bases: formic.formic.Matcher

A Matcher for matching the constant passed in the constructor.

This is used to more efficiently match path and file elements that do not have a wild-card, eg __init__.py

match(string)

Returns True if the argument matches the constant.

class formic.formic.FNMatcher(pattern)

Bases: formic.formic.Matcher

A Matcher that matches simple file/directory wildcards as per DOS or Unix.

  • FNMatcher("*.py") matches all Python files in a given directory.
  • FNMatcher("?ed") matches bed, fed, wed but not failed

FNMatcher internally uses fnmatch.fnmatch() to implement Matcher.match()

match(string)

Returns True if the pattern matches the string

class formic.formic.FileSet(include, exclude=None, directory=None, default_excludes=True, walk=<function walk at 0x1002dba28>, symlinks=True)

Bases: object

An implementation of the Ant FileSet class.

Arguments to the constructor:

  1. include: An Ant glob or list of Ant globs for matching files to include in the response. Ant globs can be specified either:
    1. As a string, eg "*.py", or
    2. As a Pattern object
  2. exclude: Specified in the same was as include, but any file that matches an exclude glob will be excluded from the result.
  3. directory: The directory from which to start the search; if None, the current working directory is used
  4. default_excludes: A boolean; if True (or omitted) the DEFAULT_EXCLUDES will be combined with the exclude. If False, the only excludes used are those in the excludes argument
  5. symlinks: Sets whether symbolic links are included in the results or not.
  6. walk: A function whose argument is a single directory that returns a list of (dirname, subdirectoryNames, fileNames) tuples with the same semantics of os.walk(). Defaults to os.walk()

Usage

First, construct a FileSet:

from formic import FileSet
fileset = FileSet(directory="/some/where/interesting",
                  include="*.py",
                  exclude=["**/*test*/**", "test*"]
                  )

There are three APIs for retrieving matches:

  1. FileSet is itself an iterator and returns absolute file names:

    for filename in fileset:
        print filename
    
  2. For more control, use fileset.qualified_files(). The following prints filenames relative to the directory:

    for filename in fileset.qualified_files(absolute=False):
        print filename
    
  3. For absolute control, use the fileset.files() method and handle the returned tuple yourself:

    prefix = fileset.get_directory()
    for directory, file_name in fileset.files():
        sys.stdout.write(prefix)
        if dir:
            sys.stdout.write(path.sep)
            sys.stdout.write(directory)
        sys.stdout.write(path.sep)
        sys.stdout.write(file_name)
        sys.stdout.write("\n")
    

Implementation notes:

  • FileSet is lazy: The files in the FileSet are resolved at the time the iterator is looped over. This means that it is very fast to set up and (can be) computationally expensive only when results are obtained.

  • You can iterate over the same FileSet instance as many times as you want. Because the results are computed as you iterate over the object, each separate iteration can return different results, eg if the file system has changed.

  • include and exclude arguments to the constructor can be given in several ways:

    • A string: This will be automatically turned into a Pattern
    • A Pattern: If you prefer to construct the pattern yourself
    • A list of strings and/or Pattern instances (as above)
  • In addition to Apache Ant’s default excludes, FileSet excludes:

    • __pycache__
  • You can modify the DEFAULT_EXCLUDES class member (it is a list of Pattern instances). Doing so will modify the behaviour of all instances of FileSet using default excludes.

  • You can provide and alternate function to os.walk() that, for example, heavily truncates the files and directories being searched or returns files and directories that don’t even exist on the file system. This can be useful for testing or even for passing the results of one FileSet result as the search path of a second. See formic.walk_from_list():

       files = ["CVS/error.py", "silly/silly1.txt", "1/2/3.py", "silly/silly3.txt", "1/2/4.py", "silly/silly3.txt"]
       fileset = FileSet(include="*.py", walk=walk_from_list(files))
       for dir, file in fileset:
           print dir, file
    
    This lists 1/2/3.py and 1/2/4.py no matter what the contents of the
    current directory are. CVS/error.py is not listed because of the default
    excludes.
DEFAULT_EXCLUDES = [**/__pycache__/**/*, **/*~, **/#*#, **/.#*, **/%*%, **/._*, **/CVS, **/CVS/**/*, **/.cvsignore, **/SCCS, **/SCCS/**/*, **/vssver.scc, **/.svn, **/.svn/**/*, **/.DS_Store, **/.git, **/.git/**/*, **/.gitattributes, **/.gitignore, **/.gitmodules, **/.hg, **/.hg/**/*, **/.hgignore, **/.hgsub, **/.hgsubstate, **/.hgtags, **/.bzr, **/.bzr/**/*, **/.bzrignore]

Default excludes shared by all instances. The member is a list of Pattern instances. You may modify this member at run time to modify the behaviour of all instances.

files()

A generator function for iterating over the individual files of the FileSet.

The generator yields a tuple of (rel_dir_name, file_name):

  1. rel_dir_name: The path relative to the starting directory
  2. file_name: The unqualified file name
get_directory()

Returns the directory in which the FileSet will be run.

If the directory was set with None in the constructor, get_directory() will return the current working directory.

The returned result is normalized so it never contains a trailing path separator

qualified_files(absolute=True)

An alternative generator that yields files rather than directory/file tuples.

If absolute is false, paths relative to the starting directory are returned, otherwise files are fully qualified.

class formic.formic.FileSetState(label, directory, based_on=None, unmatched=None)

Bases: object

FileSetState is an object encapsulating the FileSet in a particular directory, caching inheritable Pattern matches.

This is an internal implementation class and not meant for reuse or to be accessed directly

Implementation notes:

As the FileSet traverses the directories using, by default, os.walk(), it builds two graphs of FileSetState instances mirroring the graph of directories - one graph of FileSetState instances is for the include globs and the other graph of FileSetState instances for the exclude.

FileSetState embodies logic to decide whether to prune whole directories from the search, either by detecting the include patterns cannot match any file within, or by detecting that an exclude matches all files in this directory and sub-directories.

The constructor has the following arguments:

  1. label: A string used only in the __str__() method (for debugging)
  2. directory: The point in the graph that this FileSetState represents. directory is relative to the starting node of the graph
  3. based_on: A FileSetState from the previous directory traversed by FileSet.walk(). This is used as the start point in the graph of FileSetStates to search for the correct parent of this. This is None to create the root node.
  4. unmatched: Used only when based_on is None - the set of initial Pattern instances. This is either the original include or exclude globs.

During the construction of the instance, the instance will evaluate the directory patterns in PatternSet self.unmatched and, for each Pattern, perform of of the following actions:

1. If a pattern matches, it will be moved into one of the ‘matched’ PatternSet instances:

  1. self.matched_inherit: the directory pattern matches all sub subdirectories as well, eg /test/**
  2. self.matched_and_subdir: the directory matches this directory and may match subdirectories as well, eg /test/**/more/**
  3. self.matched_no_subdir: the directory matches this directory, but cannot match any subdirectory, eg /test/*. This pattern will thus not be evaluated in any subdirectory.
  1. If the pattern does not match, either:
    1. It may be valid in subdirectories, so it stays in self.unmatched, eg **/nomatch/*
    2. It cannot evaluate to true in any subdirectory, eg /nomatch/**. In this case it is removed from all PatternSet members in this instance.
match(files)

Given a set of files in this directory, returns all the files that match the Pattern instances which match this directory.

matches_all_files_all_subdirs()

Returns True if there is a pattern that:

  • Matches this directory, and
  • Matches all sub-directories, and
  • Matches all files (eg ends with “*”)

This acts as a terminator for FileSetState instances in the excludes graph.

no_possible_matches_in_subdirs()

Returns True if there are no possible matches for any subdirectories of this FileSetState.

When this :class:FileSetState is used for an ‘include’, a return of True means we can exclude all subdirectories.

exception formic.formic.FormicError(message=None)

Bases: exceptions.Exception

Formic errors, such as misconfigured arguments and internal exceptions

class formic.formic.MatchType

Bases: object

An enumeration of different match/non-match types to optimize the search algorithm.

There are two special considerations in match results that derive from the fact that Ant globs can be ‘bound’ to the start of the path being evaluated (eg bound start: /Documents/**).

The various match possibilities are bitfields using the members starting BIT_.

BIT_ALL_SUBDIRECTORIES = 2
BIT_MATCH = 1
BIT_NO_SUBDIRECTORIES = 4
MATCH = 1
MATCH_ALL_SUBDIRECTORIES = 3
MATCH_BUT_NO_SUBDIRECTORIES = 5
NO_MATCH = 0
NO_MATCH_NO_SUBDIRECTORIES = 4
class formic.formic.Matcher(pattern)

Bases: object

An abstract class that holds some pattern to be matched; matcher.match(string) returns a boolean indicating whether the string matches the pattern.

The Matcher.create() method is a Factory that creates instances of various subclasses.

static create(pattern)

Factory for Matcher instances; returns a Matcher suitable for matching the supplied pattern

match(_)

Matcher is an abstract class - this will raise a FormicError

class formic.formic.Pattern(elements)

Bases: object

Represents a single Ant Glob.

The Pattern object compiles the pattern into several components:

  • file_pattern: The a pattern for matching files (not directories) eg, for test/*.py, the file_pattern is *.py. This is always the text after the final / (if any). If the end of the pattern is a /, then an implicit ** is added to the end of the pattern.
  • bound_start: True if the start of the pattern is ‘bound’ to the start of the path. If the pattern starts with a /, the start is bound.
  • bound_end: True if the end of the pattern is bound to the immediate parent directory where the file matching is occurring. This is True if the pattern specifies a directory before the file pattern, eg **/test/*
  • sections: A list of Section instances. Each Section represents a contiguous series of path patterns, and Section instances are separated whenever there is a ** in the glob.

Pattern also normalises the glob, removing redundant path elements (eg **/**/test/* resolves to **/test/*) and normalises the case of the path elements (resolving difficulties with case insensitive file systems)

all_files()

Returns True if the Pattern matches all files (in a matched directory).

The file pattern at the end of the glob was / or /*

static create(glob)
match_directory(path_elements)

Returns a MatchType for the directory, expressed as a list of path elements, match for the Pattern.

If self.bound_start is True, the first Section must match from the first directory element.

If self.bound_end is True, the last Section must match the last contiguous elements of path_elements.

match_files(matched, unmatched)

Moves all matching files from the set unmatched to the set matched.

Both matched and unmatched are sets of string, the strings being unqualified file names

class formic.formic.PatternSet

Bases: object

A set of Pattern instances; PatternSet provides
a number of operations over the entire set.

PatternSet contains a number of implementation optimizations and is an integral part of various optimizations in FileSet.

This class is not an implementation of Apache Ant PatternSet

all_files()

Returns True if there is any Pattern in the PatternSet that matches all files (see Pattern.all_files())

Note that this method is implemented using lazy evaluation so direct access to the member _all_files is very likely to result in errors

append(pattern)

Adds a Pattern to the PatternSet

empty()

Returns True if the PatternSet is empty

extend(patterns)

Extend a PatternSet with addition patterns

patterns can either be:

iter()

An iteration generator that allows the loop to modify the PatternSet during the loop

match_files(matched, unmatched)

Apply the include and exclude filters to those files in unmatched, moving those that are included, but not excluded, into the matched set.

Both matched and unmatched are sets of unqualified file names.

remove(pattern)

Remove a Pattern from the PatternSet

class formic.formic.Section(elements)

Bases: object

A minimal object that holds fragments of a Pattern path.

Each Section holds a list of pattern fragments matching some contiguous portion of a full path, separated by /**/ from other Section instances.

For example, the Pattern /top/second/**/sub/**end/* is stored as a list of three Section objects:

  1. Section(["top", "second"])
  2. Section(["sub"])
  3. Section(["end"])
match_iter(path_elements, start_at)

A generator that searches over path_elements (starting from the index start_at), yielding for each match.

Each value yielded is the index into path_elements to the first element after each match. In other words, the returned index has already consumed the matching path elements of this Section.

Matches work by finding a contiguous group of path elements that match the list of Matcher objects in this Section as they are naturally paired.

This method includes an implementation optimization that simplifies the search for Section instances containing a single path element. This produces significant performance improvements.

formic.formic.get_initial_default_excludes()

Returns a the default excludes as a list of Patterns.

This will be the initial value of FileSet.DEFAULT_EXCLUDES. It is defined in the Ant documentation. Formic adds **/__pycache__/**, with the resulting list being:

  • **/pycache/**/*
  • **/*~
  • **/#*#
  • **/.#*
  • **/%*%
  • **/._*
  • **/CVS
  • **/CVS/**/*
  • **/.cvsignore
  • **/SCCS
  • **/SCCS/**/*
  • **/vssver.scc
  • **/.svn
  • **/.svn/**/*
  • **/.DS_Store
  • **/.git
  • **/.git/**/*
  • **/.gitattributes
  • **/.gitignore
  • **/.gitmodules
  • **/.hg
  • **/.hg/**/*
  • **/.hgignore
  • **/.hgsub
  • **/.hgsubstate
  • **/.hgtags
  • **/.bzr
  • **/.bzr/**/*
  • **/.bzrignore
formic.formic.get_path_components(directory)

Breaks a path to a directory into a (drive, list-of-folders) tuple

Parameters:directory
Returns:a tuple consisting of the drive (if any) and an ordered list of folder names
formic.formic.get_version()

Returns the version of formic.

This method retrieves the version from VERSION.txt, and it should be exactly the same as the version retrieved from the package manager

formic.formic.is_root(directory)

Returns true if the directory is root (eg / on UNIX or c:on Windows)

formic.formic.list_to_tree(files)

Converts a list of filenames into a directory tree structure.

formic.formic.reconstitute_path(drive, folders)

Reverts a tuple from get_path_components into a path.

Parameters:
  • drive – A drive (eg ‘c:’). Only applicable for NT systems
  • folders – A list of folder names
Returns:

A path comprising the drive and list of folder names. The path terminate with a os.path.sep only if it is a root directory

formic.formic.tree_walk(directory, tree)

Walks a tree returned by list_to_tree returning a list of 3-tuples as if from os.walk().

formic.formic.walk_from_list(files)

A function that mimics os.walk() by simulating a directory with the list of files passed as an argument.

Parameters:files – A list of file paths
Returns:A function that mimics os.walk() walking a directory containing only the files listed in the argument

command Module

The command-line glue-code for formic. Call main() with the command-line arguments.

Full usage of the command is:

usage: formic [-i [INCLUDE [INCLUDE ...]]] [-e [EXCLUDE [EXCLUDE ...]]]
             [--no-default-excludes] [--no-symlinks] [-r] [-h] [--usage]
             [--version]
             [directory]

Search the file system using Apache Ant globs

Directory:
  directory             The directory from which to start the search (defaults
                        to current working directory)

Globs:
  -i [INCLUDE [INCLUDE ...]], --include [INCLUDE [INCLUDE ...]]
                        One or more Ant-like globs in include in the search.If
                        not specified, then all files are implied
  -e [EXCLUDE [EXCLUDE ...]], --exclude [EXCLUDE [EXCLUDE ...]]
                        One or more Ant-like globs in include in the search
  --no-default-excludes
                        Do not include the default excludes
  --no-symlinks         Do not include symlinks

Output:
   -r, --relative       Print file paths relative to directory.

Information:
  -h, --help            Prints this help and exits
  --usage               Prints additional help on globs and exits
  --version             Prints the version of formic and exits
formic.command.create_parser()

Creates and returns the command line parser, an argparser.ArgumentParser instance.

formic.command.entry_point()

Entry point for command line; calls main() and then sys.exit() with the return value.

formic.command.main(*kw)

Command line entry point; arguments must match those defined in in create_parser(); returns 0 for success, else 1.

Example:

command.main("-i", "**/*.py", "--no-default-excludes")

Runs formic printing out all .py files in the current working directory and its children to sys.stdout.

If kw is None, main() will use sys.argv.

Table Of Contents

Previous topic

Ant Globs