Skip to content

caches

CacheDB

Bases: ABC

Abstract base class for connecting to a local SQLite cache database.

This class is intended to always be used with a context manager to properly close the connection to the cache database.

__enter__()

Called when entering the context manager.

__exit__(exc_type, exc_value, exc_traceback)

Called when exiting the context manager.

get_default_cache_db_file_dir() classmethod

Gets the expected directory for the cache database file based on OS environment variables. If a directory cannot be found, defaults to the working directory.

get_local_connection()

Create and/or returns a thread local connection to the SQLite database.

remove_cache()

Removes the underlying cache contents from the file system.

HashCache

Bases: CacheDB

Class used to store and retrieve entries in the local file hash cache.

This class is intended to always be used with a context manager to properly close the connection to the hash cache database.

This class also automatically locks when doing writes, so it can be called by multiple threads.

Schema (hashesV4): - file_path: blob (part of composite primary key) - hash_algorithm: text (part of composite primary key) - range_start: integer (part of composite primary key) - range_end: integer (part of composite primary key) - file_hash: text - last_modified_time: timestamp

For whole-file hashes, range_start=0 and range_end=-1. For byte-range hashes, range_start and range_end define [start, end).

get_connection_entry(file_path_key, hash_algorithm, connection, range_start=0, range_end=WHOLE_FILE_RANGE_END)

Returns an entry from the hash cache, if it exists.

This is the "lockless" version of get_entry which expects a connection parameter for the connection which will be used to read from the DB - this can generally be the thread local connection returned by get_local_connection()

Parameters:

Name Type Description Default
file_path_key str

The file path to look up

required
hash_algorithm HashAlgorithm

The hash algorithm used

required
connection Any

SQLite connection to use

required
range_start int

Start byte offset (0 for whole-file)

0
range_end int

End byte offset (-1/WHOLE_FILE_RANGE_END for whole-file)

WHOLE_FILE_RANGE_END

Returns:

Type Description
Optional[HashCacheEntry]

HashCacheEntry if found, None otherwise

get_entry(file_path_key, hash_algorithm, range_start=0, range_end=WHOLE_FILE_RANGE_END)

Returns an entry from the hash cache, if it exists.

Parameters:

Name Type Description Default
file_path_key str

The file path to look up

required
hash_algorithm HashAlgorithm

The hash algorithm used

required
range_start int

Start byte offset (0 for whole-file)

0
range_end int

End byte offset (-1/WHOLE_FILE_RANGE_END for whole-file)

WHOLE_FILE_RANGE_END

Returns:

Type Description
Optional[HashCacheEntry]

HashCacheEntry if found, None otherwise

put_entry(entry)

Inserts or replaces an entry into the hash cache database after acquiring the lock.

The entry's range_start and range_end determine whether this is a whole-file hash (range_start=0, range_end=-1) or a byte-range hash.

HashCacheEntry dataclass

Represents an entry in the local hash-cache database.

For whole-file hashes: range_start=0, range_end=-1 (WHOLE_FILE_RANGE_END) For chunk hashes: range_start and range_end define the byte range [start, end)

is_whole_file()

Returns True if this entry represents a whole-file hash.

S3CheckCache

Bases: CacheDB

Maintains a cache of 'last seen on S3' entries in a local database, which specifies which full S3 object keys exist in the content-addressed storage in the Job Attachments S3 bucket.

This class is intended to always be used with a context manager to properly close the connection to the hash cache database.

This class also automatically locks when doing writes, so it can be called by multiple threads.

get_connection_entry(s3_key, connection)

Returns an entry from the hash cache, if it exists. This is the "lockless" (Doesn't take the main db_lock protecting db_connection) version of get_entry which expects a connection parameter for the connection which will be used to read from the DB - this can generally be the thread local connection returned by get_local_connection()

get_entry(s3_key)

Checks if an entry exists in the cache, and returns it if it hasn't expired.

put_entry(entry)

Inserts or replaces an entry into the cache database.

S3CheckCacheEntry dataclass

Represents an entry in the local s3 check cache database