caches
CacheDB
Bases: ABC
Abstract base class for connecting to a local SQLite cache database.
This class is intended to always be used with a context manager to properly close the connection to the cache database.
__enter__()
Called when entering the context manager.
__exit__(exc_type, exc_value, exc_traceback)
Called when exiting the context manager.
get_default_cache_db_file_dir()
classmethod
Gets the expected directory for the cache database file based on OS environment variables. If a directory cannot be found, defaults to the working directory.
get_local_connection()
Create and/or returns a thread local connection to the SQLite database.
remove_cache()
Removes the underlying cache contents from the file system.
HashCache
Bases: CacheDB
Class used to store and retrieve entries in the local file hash cache.
This class is intended to always be used with a context manager to properly close the connection to the hash cache database.
This class also automatically locks when doing writes, so it can be called by multiple threads.
Schema (hashesV4): - file_path: blob (part of composite primary key) - hash_algorithm: text (part of composite primary key) - range_start: integer (part of composite primary key) - range_end: integer (part of composite primary key) - file_hash: text - last_modified_time: timestamp
For whole-file hashes, range_start=0 and range_end=-1. For byte-range hashes, range_start and range_end define [start, end).
get_connection_entry(file_path_key, hash_algorithm, connection, range_start=0, range_end=WHOLE_FILE_RANGE_END)
Returns an entry from the hash cache, if it exists.
This is the "lockless" version of get_entry which expects a connection parameter for the connection which will be used to read from the DB - this can generally be the thread local connection returned by get_local_connection()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path_key
|
str
|
The file path to look up |
required |
hash_algorithm
|
HashAlgorithm
|
The hash algorithm used |
required |
connection
|
Any
|
SQLite connection to use |
required |
range_start
|
int
|
Start byte offset (0 for whole-file) |
0
|
range_end
|
int
|
End byte offset (-1/WHOLE_FILE_RANGE_END for whole-file) |
WHOLE_FILE_RANGE_END
|
Returns:
| Type | Description |
|---|---|
Optional[HashCacheEntry]
|
HashCacheEntry if found, None otherwise |
get_entry(file_path_key, hash_algorithm, range_start=0, range_end=WHOLE_FILE_RANGE_END)
Returns an entry from the hash cache, if it exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path_key
|
str
|
The file path to look up |
required |
hash_algorithm
|
HashAlgorithm
|
The hash algorithm used |
required |
range_start
|
int
|
Start byte offset (0 for whole-file) |
0
|
range_end
|
int
|
End byte offset (-1/WHOLE_FILE_RANGE_END for whole-file) |
WHOLE_FILE_RANGE_END
|
Returns:
| Type | Description |
|---|---|
Optional[HashCacheEntry]
|
HashCacheEntry if found, None otherwise |
put_entry(entry)
Inserts or replaces an entry into the hash cache database after acquiring the lock.
The entry's range_start and range_end determine whether this is a whole-file hash (range_start=0, range_end=-1) or a byte-range hash.
HashCacheEntry
dataclass
Represents an entry in the local hash-cache database.
For whole-file hashes: range_start=0, range_end=-1 (WHOLE_FILE_RANGE_END) For chunk hashes: range_start and range_end define the byte range [start, end)
is_whole_file()
Returns True if this entry represents a whole-file hash.
S3CheckCache
Bases: CacheDB
Maintains a cache of 'last seen on S3' entries in a local database, which specifies which full S3 object keys exist in the content-addressed storage in the Job Attachments S3 bucket.
This class is intended to always be used with a context manager to properly close the connection to the hash cache database.
This class also automatically locks when doing writes, so it can be called by multiple threads.
get_connection_entry(s3_key, connection)
Returns an entry from the hash cache, if it exists. This is the "lockless" (Doesn't take the main db_lock protecting db_connection) version of get_entry which expects a connection parameter for the connection which will be used to read from the DB - this can generally be the thread local connection returned by get_local_connection()
get_entry(s3_key)
Checks if an entry exists in the cache, and returns it if it hasn't expired.
put_entry(entry)
Inserts or replaces an entry into the cache database.
S3CheckCacheEntry
dataclass
Represents an entry in the local s3 check cache database