The app for independent voices

fsspec is a Python standard.

It's the 20th most downloaded Python package*

fsspec is a filesystem API specification.

Main benefit: one frontend, different backends.

You can use the same Python code (frontend) to work with different filesystems (backends).

fsspec filesystem functions (frontend):

  • put_file(..)

  • mkdir(..)

  • rmdir(..)

  • ls(..)

  • glob(..)

  • ...

Filesystems (backends):

  • S3

  • Azure Blob Storage

  • GCS

  • local storage

  • ...

Implementations of the specification link the frontend and backend:

  • fsspec <> s3fs <> (aio)botocore <> S3

  • fsspec <> adlfs <> Azure SDK for Python <> Azure Blob Storage

  • fsspec <> gcsfs <> Google Storage API <> GCS

  • fsspec <> implementations.local <> python os library <> local storage

  • fsspec <> ...

s3fs, adlfs, gcsfs, and implementations.local are implementations:

  • s3fs, adlfs, gcsfs ➜ maintained outside fsspec repo, need separate pip install

  • implementations.local ➜ maintained inside fsspec repo

fsspec defines the filesystem specification as an abstract Python class:

class AbstractFileSystem(..)

It also defines a subclass for async operations:

class AsyncFileSystem(AbstractFileSystem)

Implementations subclass AbstractFileSystem directly or via AsyncFileSystem:

  • s3fs: class S3FileSystem(AsyncFileSystem)

  • adlfs: class AzureBlobFileSystem(AsyncFileSystem)

  • gcsfs: class GCSFileSystem(AsyncFileSystem)

  • fsspec.implementations: class LocalFileSystem(AbstractFileSystem)

  • - ...

Each implementation uses backend specific code to implement the functions defined in the base class.

(Currently writing a blog on how dlt uses fsspec. Will post it here once done.)

*Based on PyPI stats for downloads last month.

Jul 27, 2024
at
7:16 PM

Log in or sign up

Join the most interesting and insightful discussions.