File and Directory Classes

File Objects

class mediafs.File(path, parent=None)[source]

Object that represents a file in the filesystem

abspath

The absolute path to the file or directory. Uses os.path.abspath(). Lazily evaluated and cached.

atime()

Last access time as reported by the underlying filesystem. Calls os.path.getatime() on the file or directory and returns the result as a datetime object.

crc(refresh=False)[source]

Calculate the CRC for this file. The result is cached, so subsequent calls do not result in calculating the CRC multiple times. If refresh is True, then the result is recalculated.

deserialize(attrs)

Takes a dict object and returns a new instance of this class with all attributes initialized to the values contained in the dict.

exists()

Does the file exist? Calls os.path.exists() on the file or directory and returns the result.

fasthash(refresh=False)[source]

Calculate a hash for this file that works well on larger files but is optimized for speed. The result is cached, so subsequent calls do not result in calculating the hash multiple times. If refresh is True, then the result is recalculated.

get(key, default=None)

Helper method for getting values from the metadata dict. Primarily useful for shortening Directory.query() lambda functions.

Example:

directory.query(lambda f: 'author' in f.metadata and f.metadata['author'] == "The Clash")

can be shortened to:

directory.query(lambda f: f.get('author') == "The Clash")

The default argument is the value that will be returned if key is not a valid key in the metadata dict. This is useful if you are expecting a particular type and want to do some operation on that type. For example:

directory.query(lambda f: f.get('year', default=0) > 1990))
hash()[source]

For files, instead of returning the relative path of the file, return the hash, so that if a file is moved or renamed the metadata will remain associated with it. This will also result in duplicate files having the same metadata (which is the intended behavior).

matches(other)

Returns True if this file or directory is the same as another file or directory. Compares by hash, so file1.matches(file2) == True if file1 and file2 have identical contents.

md5(refresh=False)[source]

Calculate the MD5 sum for this file. The result is cached, so subsequent calls do not result in calculating the MD5 sum multiple times. If refresh is True, then the result is recalculated.

metadata

The metadata dict for this file or directory

mtime()

Last modified time as reported by the underlying filesystem. Calls os.path.getmtime() on the file or directory and returns the result as a datetime object.

relpath

The file or directory path relative to the root directory.

rename(newName, syscall=True)

Renames the file or directory. Raises a FileExistsError exception if the new name already exists.

If the syscall argument is True, then os.rename() will be called on the underlying file or directory. Setting this to False is primarily useful for keeping things in sync if you know a rename occured and want to avoid the overhead of a refresh() call.

root

A reference to the root directory object

serialize()

Returns a dict object containing the attributes of this object. Used for serializing the directory tree to a file.

size

The size of the file or directory contents in bytes. Lazily evaluated and cached.

stat()

Calls os.stat() on the file or directory and returns the result.

Directory Objects

class mediafs.Directory(path, parent=None)[source]

Object that represents a directory in the filesystem

__contains__(key)[source]

Checks if a given file or directory name is contained in this directory

__getitem__(key)[source]

Directory objects support a number of different indexing methods, all of which either return a single object or a list containing multiple objects, which is useful when you want to assign the results to a variable (as opposed to the searching methods filter(), search(), query(), and all(), which are generators).

Directories support the following syntaxes for indexing:

  • An ellipsis object returns a list of all children, recursively.

    directory[...]

    (same as list(directory.all(recursive=True)))

  • An integer, which is treated as an index and returns one item based on the directory ordering. Because the ordering is precalculated, this is O(1). Returns exactly one item.

    directory[2]

  • A slice, which is treated as a range of indices based on the directory ordering.

    directory[1:3]

  • An empty slice, which returns a list of items in the directory.

    directory[:]

    (same as list(directory.all(recursive=False)))

  • A string key, which is treated as a file or directory name and uses a dict-based lookup for O(1) lookups. Returns exactly one item.

    directory["asdf.txt"]

  • A string which contains either a * or a ?. This string is passed to the Python stdlib library fnmatch to support searches and returns a list of files or directories that match the pattern. See the documentation for the fnmatch library for more information.

    directory["*.txt"]

    (same as list(directory.filter("*.txt")))

__len__()[source]

Return the number of files and directories in this directory

abspath

The absolute path to the file or directory. Uses os.path.abspath(). Lazily evaluated and cached.

all(recursive=False, reverse=False, dirs=True, files=True)[source]

A generator that yields all files and subdirectories contained within this directory.

  • If recursive is True, then it will also yield all items contained in those subdirectories.
  • If reverse is True, then it will iterate in reverse order.
  • The dirs argument indicates whether or not directories should be yielded.
  • The files argument indicates whether or not files should be yielded.
atime()

Last access time as reported by the underlying filesystem. Calls os.path.getatime() on the file or directory and returns the result as a datetime object.

contents

The dict representing the contents of this directory. If this directory has not been refreshed yet, accessing this property will trigger a refresh(recursive=False) before returning the dict.

If you have code accessing a single specific file or directory object in an inner loop, a small optimization could be calling directory.contents[filename] instead of directory[filename], due to the number of overloads in Directory.__getitem__.

classmethod deserialize(attrs)[source]

Takes a dict object, and returns a new instance of this class with all attributes initialized to the values contained in the dict.

exists()

Does the file exist? Calls os.path.exists() on the file or directory and returns the result.

filter(pattern, recursive=False, dirs=True, files=True, ignoreCase=True)[source]

Uses the Python stdlib fnmatch library to search the filesystem.

If ignoreCase is True, then fnmatch.fnmatch() will be used, and filenames will be converted to lowercase before comparisons are made.

If ignoreCase is False, then fnmatch.fnmatchcase() will be used.

See https://docs.python.org/library/fnmatch.html for more information about the pattern syntax.

recursive, dirs, and files arguments are passed to Directory.all().

get(key, default=None)

Helper method for getting values from the metadata dict. Primarily useful for shortening Directory.query() lambda functions.

Example:

directory.query(lambda f: 'author' in f.metadata and f.metadata['author'] == "The Clash")

can be shortened to:

directory.query(lambda f: f.get('author') == "The Clash")

The default argument is the value that will be returned if key is not a valid key in the metadata dict. This is useful if you are expecting a particular type and want to do some operation on that type. For example:

directory.query(lambda f: f.get('year', default=0) > 1990))
hash()

Return a hash suitable for storing the metadata dict for this object. This should be unique among all files and directories in the RootDirectory object. For directories, its best to use the relative path. For files, we can hash the file and use that, which means that moving or renaming the file won’t lose track of data.

matches(other)

Returns True if this file or directory is the same as another file or directory. Compares by hash, so file1.matches(file2) == True if file1 and file2 have identical contents.

metadata

The metadata dict for this file or directory

mtime()

Last modified time as reported by the underlying filesystem. Calls os.path.getmtime() on the file or directory and returns the result as a datetime object.

order

A list representing the order of the items in this directory. Lazily evaluated and cached.

Accessing this property will trigger refresh(recursive=False) if a refresh has never been run on this directory.

query(query, recursive=False, dirs=True, files=True)[source]

Uses a custom function to search the filesystem. That function is passed a single argument, an FSObject, and should return a boolean that determines if the file matches.

recursive, dirs, and files arguments are passed to Directory.all().

Examples:

All files that are named “file1.txt” or “file2.txt”, recursively:
>>> directory.query(lambda f: f.name in ("file1.txt", "file2.txt"), recursive=True)
All files larger than 1024 bytes:
>>> directory.query(lambda f: f.size > 1024, dirs=False)
All files and directories that start with E:
>>> directory.query(lambda f: f.name.startswith("E"))
All files modified within the last 7 days:
>>> from datetime import datetime, timedelta
>>> directory.query(lambda f: f.mtime > (datetime.now() - timedelta(days=7)), dirs=False)
All directories with more than 10 items:
>>> directory.query(lambda d: len(d) > 10, recursive=True, files=False)
All directories that contain a file called “asdf.txt”:
>>> directory.query(lambda d: "asdf.txt" in d, recursive=True, files=False)
refresh(*files, **kwargs)[source]

Rescans the filesystem and rebuilds the index for this directory. If any files are specified, then refresh() will only scan those files. Otherwise it will scan all files.

If recursive=True is passed in, then refresh() will also be called on all subdirectories.

relpath

The file or directory path relative to the root directory.

rename(newName, syscall=True)

Renames the file or directory. Raises a FileExistsError exception if the new name already exists.

If the syscall argument is True, then os.rename() will be called on the underlying file or directory. Setting this to False is primarily useful for keeping things in sync if you know a rename occured and want to avoid the overhead of a refresh() call.

root

A reference to the root directory object

search(regex, recursive=False, dirs=True, files=True, flags=2)[source]

Uses a regex as a query string to search the filesystem. Uses case-insensitive matching by default. Passes the value of the flags argument directly through to re.compile(), so check out the docs on the regex module for how that works.

The default value for flags is re.IGNORECASE.

recursive, dirs, and files arguments are passed to Directory.all().

Example: directory.search(r'(.*)\.txt')

serialize()

Returns a dict object containing the attributes of this object. Used for serializing the directory tree to a file.

size

For directories, recursively calculate the size of the contents of the directory. This value is lazily evaluated and cached.

size

For directories, recursively calculate the size of the contents of the directory. This value is lazily evaluated and cached.

stat()

Calls os.stat() on the file or directory and returns the result.