fsspec is a Python standard.
It's the 20th most downloaded Python package*
fsspec is a filesystem API specification.
Main benefit: one frontend, different backends.
You can use the same Python code (frontend) to work with different filesystems (backends).
fsspec filesystem functions (frontend):
put_file(..)
mkdir(..)
rmdir(..)
ls(..)
glob(..)
...
Filesystems (backends):
S3
Azure Blob Storage
GCS
local storage
...
Implementations of the specification link the frontend and backend:
fsspec <> s3fs <> (aio)botocore <> S3
fsspec <> adlfs <> Azure SDK for Python <> Azure Blob Storage
fsspec <> gcsfs <> Google Storage API <> GCS
fsspec <> implementations.local <> python os library <> local storage
fsspec <> ...
s3fs, adlfs, gcsfs, and implementations.local are implementations:
s3fs, adlfs, gcsfs ➜ maintained outside fsspec repo, need separate pip install
implementations.local ➜ maintained inside fsspec repo
fsspec defines the filesystem specification as an abstract Python class:
class AbstractFileSystem(..)
It also defines a subclass for async operations:
class AsyncFileSystem(AbstractFileSystem)
Implementations subclass AbstractFileSystem directly or via AsyncFileSystem:
s3fs: class S3FileSystem(AsyncFileSystem)
adlfs: class AzureBlobFileSystem(AsyncFileSystem)
gcsfs: class GCSFileSystem(AsyncFileSystem)
fsspec.implementations: class LocalFileSystem(AbstractFileSystem)
- ...
Each implementation uses backend specific code to implement the functions defined in the base class.
(Currently writing a blog on how dlt uses fsspec. Will post it here once done.)
*Based on PyPI stats for downloads last month.