Boost Your Jupyter Notebook Reruns with Cell Caching

XetCache

Have you‍ experienced the frustration of long running functions or ‍jupyter
notebook cells? XetCache⁢ provides persistent caching for these tasks.

Store the cache on your local disk alongside your⁤ notebooks and code. You can also ⁤take advantage of LFS, or the git-xet extension to store⁢ it ⁣with your git repo. Alternatively, allow the XetHub service to handle your cache management, making it easy to share‌ with your​ collaborators.

Our motivations

The xetcache library was born out of our own challenges​ while working with Jupyter Notebooks containing ⁢long running ⁣functions. Get the full context by reading this⁣ blog post.

Install

To install from source from GitHub, use ‌the following command:

pip install git+https://github.com/xetdata/xetcache.git
Setup
Setup‍ With Local

No additional set up needed. ⁤See Usage⁢ below.

Setup with⁤ Git Storage

If using LFS, you can⁢ directly commit and ⁣push the cache files in the
xmemo folder.

With GitHub, we recommend the use ⁤of XetHub’s extensions⁤ to
GitHub
for performance and the ability to lazily fetch cached objects.

For instance, a ‍repository with the XetHub extension will allow you to
lazy‌ clone the repository with

git xet clone –lazy [repo]

which will avoid‍ fetching any large cached objects‍ until they‍ are
actually needed.

Setup ⁢With XetHub

Fully managed XetHub ‌service offers more ‌powerful data
deduplication capabilities ⁤that ‌allows similar objects to be stored or loaded,
without⁢ needing to upload or download⁣ everything. Deploy caches near
your ‍compute workloads to accelerate dataloading by⁢ over‍ 10x.

Authentication

Signup on XetHub and obtain
a username and access token. Keep this information ​handy.

Three ways to authenticate with XetHub are ​available:

Command Line
xet login -e ‍-u -p

Xet login will ⁢write authentication information to ~/.xetconfig

Environment Variable

Environment ​variables may be sometimes more convenient:

export XET_USER_EMAIL =‍
export XET_USER_NAME =
export​ XET_USER_TOKEN = In Python

Authenticate directly‌ from Python, especially in a notebook environment, or a non-persistent​ environment. Note that
this must be the first thing you run before any other operation:

import pyxet
pyxet.login(, , )
Usage

Optional : to cache on XetHub, you ⁢need ‌to run:

import xetcache
xetcache.set_xet_project([give a project name here])

Usage For Jupyter Notebooks

For Jupyter notebooks, load⁢ the extension with this command

After which adding ‍the following line to the top of a‌ cell

%%xetmemo input=v1,v2 output=v3,v4

will cache the‌ specified output⁢ variables (v3,v4 here) each time⁣ it is called.

» …
Read More rnrn

Latest articles

Related articles