upload
Classes for handling uploading of assets.
S3AssetManager
¶
Asset handler that creates an asset manifest and uploads assets. Based on an S3 file system.
hash_assets_and_create_manifest(asset_groups, total_input_files, total_input_bytes, hash_cache_dir=None, on_preparing_to_submit=None)
¶
Computes the hashes for input files, and creates manifests using the local hash cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hash_cache_dir
|
Optional[str]
|
a path to local hash cache directory. If it's None, use default path. |
None
|
on_preparing_to_submit
|
Optional[Callable[[Any], bool]]
|
a callback to be called to periodically report progress to the caller. The callback returns True if the operation should continue as normal, or False to cancel. |
None
|
Returns:
| Type | Description |
|---|---|
SummaryStatistics
|
a tuple with (1) the summary statistics of the hash operation, and |
list[AssetRootManifest]
|
(2) a list of AssetRootManifest (a manifest and output paths for each asset root). |
prepare_paths_for_upload(input_paths, output_paths, referenced_paths, storage_profile=None, require_paths_exist=False)
¶
Processes all of the paths required for upload, grouping them by asset root and local storage profile locations. Returns an object containing the grouped paths, which also includes a dictionary of input directories and file counts for files that were not under the root path or any local storage profile locations.
snapshot_assets(snapshot_dir, manifests, on_snapshotting_assets=None)
¶
Copies all the files for provided manifests and manifests themselves into a snapshot directory that matches the layout of a job attachments prefix in S3.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
snapshot_dir
|
str
|
A directory in which to place the snapshot. Data and manifest files will go in Data and Manifest subdirectories, respectively. |
required |
manifests
|
list[AssetRootManifest]
|
A list of manifests that contain assets to be uploaded |
required |
on_snapshotting_assets
|
Optional[Callable[[Any], bool]]
|
A callback to be called to periodically report progress to the caller. The callback must return True if the operation should continue as normal, or False to cancel. |
None
|
Returns:
| Type | Description |
|---|---|
SummaryStatistics
|
a tuple with (1) the summary statistics of the upload operation, and |
Attachments
|
(2) the S3 path to the asset manifest file. |
upload_assets(manifests, on_uploading_assets=None, s3_check_cache_dir=None, manifest_write_dir=None)
¶
Uploads all the files for provided manifests and manifests themselves to S3.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
manifests
|
list[AssetRootManifest]
|
a list of manifests that contain assets to be uploaded |
required |
on_uploading_assets
|
Optional[Callable[[Any], bool]]
|
a callback to be called to periodically report progress to the caller. The callback returns True if the operation should continue as normal, or False to cancel. |
None
|
Returns:
| Type | Description |
|---|---|
SummaryStatistics
|
a tuple with (1) the summary statistics of the upload operation, and |
Attachments
|
(2) the S3 path to the asset manifest file. |
S3AssetUploader
¶
Handler for uploading assets to S3 based off of an Asset Manifest. If no session is provided the default credentials path will be used, see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials
file_already_uploaded(bucket, key)
¶
Check whether the file has already been uploaded by doing a head-object call.
reset_s3_check_cache(s3_check_cache_dir)
¶
Resets the S3 check cache by removing the cache altogether.
upload_assets(job_attachment_settings, manifest, source_root, partial_manifest_prefix=None, file_system_location_name=None, progress_tracker=None, s3_check_cache_dir=None, manifest_write_dir=None, manifest_name_suffix='input', manifest_metadata=dict(), manifest_file_name=None, asset_root=None)
¶
Uploads assets based off of an asset manifest, uploads the asset manifest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
manifest
|
BaseAssetManifest
|
The asset manifest to upload. |
required |
partial_manifest_prefix
|
Optional[str]
|
The (partial) key prefix to use for uploading the manifest
to S3, excluding the initial section " |
None
|
source_root
|
Path
|
The local root path of the assets. |
required |
job_attachment_settings
|
JobAttachmentS3Settings
|
The settings for the job attachment configured in Queue. |
required |
progress_tracker
|
Optional[ProgressTracker]
|
Optional progress tracker to track progress. |
None
|
manifest_name_suffix
|
str
|
Suffix for given manifest naming. |
'input'
|
manifest_metadata
|
dict[str, dict[str, str]]
|
File metadata for given manifest to be uploaded. |
dict()
|
manifest_file_name
|
Optional[str]
|
Optional file name for given manifest to be uploaded, otherwise use default name. |
None
|
asset_root
|
Optional[Path]
|
The root in which asset actually in to facilitate path mapping. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[str, str]
|
A tuple of (the partial key for the manifest on S3, the hash of input manifest). |
upload_file_to_s3(local_path, s3_bucket, s3_upload_key, progress_tracker=None, base_dir_path=None)
¶
Uploads a single file to an S3 bucket using TransferManager, allowing mid-way
cancellation. It monitors for upload progress through a callback, handler,
which also checks if the upload should continue or not. If the progress_tracker
signals to stop, the ongoing upload is cancelled.
upload_input_files(manifest, s3_bucket, source_root, s3_cas_prefix, progress_tracker=None, s3_check_cache_dir=None)
¶
Uploads all of the files listed in the given manifest to S3 if they don't exist in the given S3 prefix already.
The local 'S3 check cache' is used to note if we've seen an object in S3 before so we can save the S3 API calls.
upload_object_to_cas(file, hash_algorithm, s3_bucket, source_root, s3_cas_prefix, s3_check_cache, progress_tracker=None)
¶
Uploads an object to the S3 content-addressable storage (CAS) prefix. Optionally, does a head-object check and only uploads the file if it doesn't exist in S3 already. Returns a tuple (whether it has been uploaded, the file size).
verify_hash_cache_integrity(s3_check_cache_dir, manifest, s3_cas_prefix, s3_bucket)
¶
Inspects a sampling of the assets provided in manifest that are present in the S3 check cache and verifies if the cached assets exist in S3. Returns True if all sampled cached assets exist in S3, False otherwise.