XetCache
Have you experienced the frustration of long running functions or jupyter
notebook cells? XetCache provides persistent caching for these tasks.
Store the cache on your local disk alongside your notebooks and code. You can also take advantage of LFS, or the git-xet extension to store it with your git repo. Alternatively, allow the XetHub service to handle your cache management, making it easy to share with your collaborators.
Our motivations
The xetcache library was born out of our own challenges while working with Jupyter Notebooks containing long running functions. Get the full context by reading this blog post.
Install
To install from source from GitHub, use the following command:
pip install git+https://github.com/xetdata/xetcache.git
Setup
Setup With Local
No additional set up needed. See Usage below.
Setup with Git Storage
If using LFS, you can directly commit and push the cache files in the
xmemo folder.
With GitHub, we recommend the use of XetHub’s extensions to
GitHub for performance and the ability to lazily fetch cached objects.
For instance, a repository with the XetHub extension will allow you to
lazy clone the repository with
git xet clone –lazy [repo]
which will avoid fetching any large cached objects until they are
actually needed.
Setup With XetHub
Fully managed XetHub service offers more powerful data
deduplication capabilities that allows similar objects to be stored or loaded,
without needing to upload or download everything. Deploy caches near
your compute workloads to accelerate dataloading by over 10x.
Authentication
Signup on XetHub and obtain
a username and access token. Keep this information handy.
Three ways to authenticate with XetHub are available:
Command Line Xet login will write authentication information to ~/.xetconfig Environment Variable Environment variables may be sometimes more convenient: export XET_USER_EMAIL = Authenticate directly from Python, especially in a notebook environment, or a non-persistent environment. Note that import pyxet Optional : to cache on XetHub, you need to run: import xetcache Usage For Jupyter Notebooks For Jupyter notebooks, load the extension with this command After which adding the following line to the top of a cell %%xetmemo input=v1,v2 output=v3,v4 will cache the specified output variables (v3,v4 here) each time it is called.
xet login -e
export XET_USER_NAME =
export XET_USER_TOKEN =
this must be the first thing you run before any other operation:
pyxet.login(
Usage
xetcache.set_xet_project([give a project name here])
