Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

Show HN: PyFilesystem 2.0 – A Python interface to filesystems of all kinds

$
0
0

I'd like to announce version 2.0.0 of PyFilesystem, which is now available on PyPi.

PyFilesystem is a Python module I started some time in 2008, and since then it has been very much a part of my personal standard library. I've used it in personal and professional projects, as have many other developers and organisations.

Recap

If you aren't familiar with PyFilesystem; it's an abstraction layer for filesystems. Essentially anything with files and directories (hard-drive, zip file, ftp server, network filesystems etc.) may be wrapped with a common interface. With it, you can write code that is agnostic as to where the files are physically located.

Here's a quick example that recursively counts the lines of code in a directory:

def count_python_loc(fs):
    """Count non-blank lines of Python code."""
    count = 0
    for path in fs.walk.files(filter=['*.py']):
        with fs.open(path) as python_file:
            count += sum(1 for line in python_file if line.strip())
    return count

from fs import open_fs
projects_fs = open_fs('~/projects')
print(count_python_loc(projects_fs))

The fs argument to count_python_loc is an FS object, which encapsulates everything you would need to do with a filesystem. Because of this abstraction, the same code will work with any filesystem. For instance, counting the lines of code in a zip file is a single line change:

projects_fs = open_fs('zip://projects.zip')

See my previous posts on PyFilesystem for more back-story.

© 2016 Will McGugan

The tree method renders the filesystem structure with unicode box drawing characters. This can be a nice way of reporting file changes in a command line app, and a useful debugging aid in general.

The fact that there are trees on my wallpaper is entirely coincidental.

Why the update?

A lot has happened since 2008. Python 3 happened. The IO library happened. Scandir happened. And while PyFilesystem has kept up with those developments, the code has suffered from the weight of small incremental changes. Not to the degree that it required a re-write perhaps, but a new version gave me the opportunity to make improvements to the API which couldn't be done without breaking a lot of code.

The re-write was guided by 8 years of observations regarding what developers wanted from the library; what worked well, what felt awkward, and the numerous edge cases in creating the illusion that all filesystems work alike (they really don't). A lot of agonizing has gone in to the design of the new API to make it simple to use without sacrificing functionality. This often required breaking up large methods in to more atomic methods that do one thing and do it well.

Another motivation for this version was to make it easier to implement new filesystems. It turns out that all you need to implement any filesystem is to write 7 methods. Which I find somewhat remarkable. The new API has been designed to make it much easier to develop your own filesystems in general, with more of the heavy lifting been done by the base class. I'm hoping this will encourage more developers to release new filesystems, and for pre-2.0.0 filesystems to be ported (which I'll gladly assist with).

So what is new?

Cleaner code

The new project is a unified Python 2 and 3 code-base. It's legacy free code with 100% coverage.

I wanted it to be bullet-proof because PyFilesystem an integral part of Moya. Moya's static server uses PyFilesystem to serve assets. Here's a screenshot of that in action:

© 2016 Will McGugan

Moya uses PyFilesystem in its static server. This screenshot shows what happens when you statically serve a FTPFS.

Moving and copying got simpler.

Here's how you compress your projects directory as a zip file:

from fs.copy import copy_fs
copy_fs("'~/projects", "'zip://~/projects.zip")

This works because the fs.copy and fs.move modules accept both FS objects and FS URLs.

Simple file information

File information has been categorised under a few namespaces, so you can request only the information you are interested in (potentially avoiding needless system calls). Here's an example:

>>> from fs import open_fs>>> my_fs = open_fs('.')>>> info = my_fs.getinfo('setup.py', namespaces=['details', 'access'])>>> info.name
'setup.py'>>> info.is_dir
False>>> info.user
'will'>>> info.permissions
Permissions(user='rw', group='r', other='r')>>> info.modified
datetime.datetime(2016, 11, 27, 0, 17, 29, tzinfo=<UTC>)

Directory scanning is more powerful

The original PyFilesystem had a design flaw that we were unfortunately stuck with; when you list a directory you couldn't know which paths were files and which were directories in a single call. A workaround was to make a call to retrieved just the directories, and a call which retrieved the files. Making two calls to retrieve the directory listing was inefficient for network filesystems. Another workaround used stat information, but that only worked for the OS filesystem.

In fs 2.0, the directory listing methods return Info objects which have an is_dir flag, so no need for any workarounds. There is also a page attribute which allows you to paginate large directories (handy if you have millions of files).

Directory walking has a simpler interface.

To compliment the directory scanning, there is a new directory walking class which supports filtering by wildcard pattern. Having an external object to do the directory walking allows for more flexibility. For instance, the copy_fs function accepts an optional walker parameter which can be used to specify which files should be copied.

Here's how you would use a Walker object to compress only .py files, while ignoring Git directories:

from fs.copy import copy_fs
from fs.walk import Walker
py_walker = Walker(filter=['*.py'], exclude_dirs=['*.git'])
copy_fs("~/projects", "zip://~/projects.zip", py_walker)

Closer to the metal

Not all filesystems are created equal; a zip file is very different from your HD, and completely different from a filesystem in the cloud. Designing a single API to makes them interchangeable is challenging. The original PyFilesystem API took the approach of supporting a common denominator of features, and simply didn't support things that weren't more or less universal (such as permissions). The new version has a different philosophy and will attempt to expose as much as possible.

Serializable API

The new API is designed to be very easy to serialize, making it easier to implement network filesystems with JSONRPC, XML, REST, etc.

In the early days I considered such things to be a novelty, but it's proven to be something that developers often want. The new API should make that less of a challenge.

Should you upgrade?

There is a lot of functionality packed in to the original PyFilesystem, such as exposing filesystems over HTTP and other network protocols, OS integration (FUSE and DOKEN), command line apps, etc. All very cool stuff, but it did make for a lengthy list of requirements. The new version is leaner and requires only a handful of pure-Python dependancies. The other cool stuff will be distributed as additional namespace packages.

If you require any of those features, you may want to stick with the pre 2.0 version until I've ported everything. Ditto if you need S3 support or one of the third-party implementations.

Otherwise, I think you will find fs 2.0.0 a solid additional to your standard library.

How to install

You can install fs with pip as follows:

pip install fs

Add the -U switch if you want to upgrade from a previous version.

Alternatively you can checkout PyFilesystem2 on GitHub.

Feedback?

Not is the best time to suggest changes and new features. Comments and PRs most welcome!


Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>