Skip to content

upload

Classes for handling uploading of assets.

S3AssetManager

Asset handler that creates an asset manifest and uploads assets. Based on an S3 file system.

hash_assets_and_create_manifest(asset_groups, total_input_files, total_input_bytes, hash_cache_dir=None, on_preparing_to_submit=None)

Computes the hashes for input files, and creates manifests using the local hash cache.

Parameters:

Name Type Description Default
hash_cache_dir Optional[str]

a path to local hash cache directory. If it's None, use default path.

None
on_preparing_to_submit Optional[Callable[[Any], bool]]

a callback to be called to periodically report progress to the caller. The callback returns True if the operation should continue as normal, or False to cancel.

None

Returns:

Type Description
SummaryStatistics

a tuple with (1) the summary statistics of the hash operation, and

list[AssetRootManifest]

(2) a list of AssetRootManifest (a manifest and output paths for each asset root).

prepare_paths_for_upload(input_paths, output_paths, referenced_paths, storage_profile=None, require_paths_exist=False)

Processes all of the paths required for upload, grouping them by asset root and local storage profile locations. Returns an object containing the grouped paths, which also includes a dictionary of input directories and file counts for files that were not under the root path or any local storage profile locations.

snapshot_assets(snapshot_dir, manifests, on_snapshotting_assets=None)

Copies all the files for provided manifests and manifests themselves into a snapshot directory that matches the layout of a job attachments prefix in S3.

Parameters:

Name Type Description Default
snapshot_dir str

A directory in which to place the snapshot. Data and manifest files will go in Data and Manifest subdirectories, respectively.

required
manifests list[AssetRootManifest]

A list of manifests that contain assets to be uploaded

required
on_snapshotting_assets Optional[Callable[[Any], bool]]

A callback to be called to periodically report progress to the caller. The callback must return True if the operation should continue as normal, or False to cancel.

None

Returns:

Type Description
SummaryStatistics

a tuple with (1) the summary statistics of the upload operation, and

Attachments

(2) the S3 path to the asset manifest file.

upload_assets(manifests, on_uploading_assets=None, s3_check_cache_dir=None, manifest_write_dir=None)

Uploads all the files for provided manifests and manifests themselves to S3.

Parameters:

Name Type Description Default
manifests list[AssetRootManifest]

a list of manifests that contain assets to be uploaded

required
on_uploading_assets Optional[Callable[[Any], bool]]

a callback to be called to periodically report progress to the caller. The callback returns True if the operation should continue as normal, or False to cancel.

None

Returns:

Type Description
SummaryStatistics

a tuple with (1) the summary statistics of the upload operation, and

Attachments

(2) the S3 path to the asset manifest file.

S3AssetUploader

Handler for uploading assets to S3 based off of an Asset Manifest. If no session is provided the default credentials path will be used, see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials

file_already_uploaded(bucket, key)

Check whether the file has already been uploaded by doing a head-object call.

reset_s3_check_cache(s3_check_cache_dir)

Resets the S3 check cache by removing the cache altogether.

upload_assets(job_attachment_settings, manifest, source_root, partial_manifest_prefix=None, file_system_location_name=None, progress_tracker=None, s3_check_cache_dir=None, manifest_write_dir=None, manifest_name_suffix='input', manifest_metadata=dict(), manifest_file_name=None, asset_root=None)

Uploads assets based off of an asset manifest, uploads the asset manifest.

Parameters:

Name Type Description Default
manifest BaseAssetManifest

The asset manifest to upload.

required
partial_manifest_prefix Optional[str]

The (partial) key prefix to use for uploading the manifest to S3, excluding the initial section "/Manifest/". e.g. "farm-1234/queue-1234/Inputs/"

None
source_root Path

The local root path of the assets.

required
job_attachment_settings JobAttachmentS3Settings

The settings for the job attachment configured in Queue.

required
progress_tracker Optional[ProgressTracker]

Optional progress tracker to track progress.

None
manifest_name_suffix str

Suffix for given manifest naming.

'input'
manifest_metadata dict[str, dict[str, str]]

File metadata for given manifest to be uploaded.

dict()
manifest_file_name Optional[str]

Optional file name for given manifest to be uploaded, otherwise use default name.

None
asset_root Optional[Path]

The root in which asset actually in to facilitate path mapping.

None

Returns:

Type Description
tuple[str, str]

A tuple of (the partial key for the manifest on S3, the hash of input manifest).

upload_file_to_s3(local_path, s3_bucket, s3_upload_key, progress_tracker=None, base_dir_path=None)

Uploads a single file to an S3 bucket using TransferManager, allowing mid-way cancellation. It monitors for upload progress through a callback, handler, which also checks if the upload should continue or not. If the progress_tracker signals to stop, the ongoing upload is cancelled.

upload_input_files(manifest, s3_bucket, source_root, s3_cas_prefix, progress_tracker=None, s3_check_cache_dir=None)

Uploads all of the files listed in the given manifest to S3 if they don't exist in the given S3 prefix already.

The local 'S3 check cache' is used to note if we've seen an object in S3 before so we can save the S3 API calls.

upload_object_to_cas(file, hash_algorithm, s3_bucket, source_root, s3_cas_prefix, s3_check_cache, progress_tracker=None)

Uploads an object to the S3 content-addressable storage (CAS) prefix. Optionally, does a head-object check and only uploads the file if it doesn't exist in S3 already. Returns a tuple (whether it has been uploaded, the file size).

verify_hash_cache_integrity(s3_check_cache_dir, manifest, s3_cas_prefix, s3_bucket)

Inspects a sampling of the assets provided in manifest that are present in the S3 check cache and verifies if the cached assets exist in S3. Returns True if all sampled cached assets exist in S3, False otherwise.