Skip to content

Artifact

Comet Artifacts allow you to keep track of any data associated with the ML Lifecycle. Depending on your application you might decide to either upload the dataset directly to Comet or using the Remote Artifacts feature to store a reference to it instead. No matter the option you choose, Comet will maintain the lineage between your datasets and the training runs that created or consumed them.

Artifacts live at a Comet Workspace level and are identified by their name. Each Artifact can have multiple versions allowing you to keep track of exactly which dataset was used.

Logging an Artifact

Logging an Artifact has three steps:

  • Create an Artifact Version
  • Add files and folders to this Artifact Version
  • Logging this Artifact Version to an Experiment

Creating an Artifact Version

To log an Artifact, you must first create an Artifact instance to which you then add some files or folders. This Artifact can then be uploaded to Comet through the Experiment object.

When creating an Artifact object, you can specify the version number as well as aliases, metadata and version tags. These parameters allow you to keep all your Artifacts and their versions organized and makes them easier to query.

Given the version parameter is optional, if you don't specify it Comet will auto-increment to the next major version number.

Adding files and folders to an Artifact Version

After creating an Artifact Version, you can then add files and folders to it. These files and folders are refered to as "artifact assets" which are broken down into two categories "artifact assets" and "remote artifact assets":

  • Artifact assets: Refers to files and folders for which the content is uploaded to Comet

  • Remote artifact assets: Refers to files and folders for which Comet only stores a reference to but not the content itself. Remote artifact assets can be any string allowing easy integration into your existing data versioning system.

Note

If a remote artifact assets is a GCS or S3 bucket path, Comet has a few special tricks up it's sleave. Assuming the credentials are correctly configured, when you log an S3 bucket as a remote Artifact asset Comet will automatically keep track of all the files in that bucket and allow you to easily download them. You get all the lineage benefits of Artifacts without needing to upload your data to another location!

Learn more about S3 and GCP support for Remote Artifacts Assets.

You can add artifact assets to an artifact using the following methods: * To log an artifact asset: Artifact.add * to log a remote artifact asset: Artifact.add_remote

Log an Artifact to Comet

When you are ready to send the Artifact to Comet, you will log it with Experiment.log_artifact(ARTIFACT).

Let's look a full example:

artifact.add
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from comet_ml import Artifact, Experiment

experiment = Experiment(
    api_key="<Your API Key>",
    project_name="<Your Project Name>"
)

artifact = Artifact(name="artifact-name", artifact_type="dataset")
artifact.add("./local-file")

experiment.log_artifact(artifact)
experiment.end()
Using artifact.add_remote
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from comet_ml import Artifact, Experiment

experiment = Experiment(
    api_key="<Your API Key>",
    project_name="<Your Project Name>"
)

artifact = Artifact(name="artifact-name", artifact_type="dataset")
artifact.add_remote("s3://...")

experiment.log_artifact(artifact)
experiment.end()

You can find the full reference documentation for the Artifact object here

Access a logged Artifact version

You can retrieve a logged Artifact from any workspace that you have permission to access, and a workspace name with the Experiment.get_artifact() method:

1
2
3
4
logged_artifact = experiment.get_artifact(
    NAME,
    WORKSPACE,
    version_or_alias=VERSION_OR_ALIAS)

To make it easier to access the artifact, you can retrieve a logged Artifact in three ways:

  • Get the latest Artifact version by leaving out the version and alias arguments.
  • Get a specific Artifact version by passing in the version argument.
  • Get an aliased Artifact version by passing in the alias argument.

Once you have retrieve a logged Artifact, you can then either:

  • Download the artifact so you can use it within your training scripts
  • Inspect the contents of an artifact without downloading it locally first

Download a logged Artifact

Downloading a logged Artifact brings the following assets to your disk:

  • All non-remote assets
  • S3 and GCP remote assets if authentication to these services in configured

This action also records that the new Experiment has accessed the Artifact, for tracking the data flow in your pipeline.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from comet_ml import Experiment

experiment = Experiment(
    api_key="<Your API Key>",
    project_name="<Your Project Name>"
)
logged_artifact = experiment.get_artifact(
    "artifact-name",
    WORKSPACE,
)

# Download the artifact:
local_artifact = logged_artifact.download("/data/input")

Inspect a logged Artifact

The LoggedArtifact.assets attribute contains all the logged assets for a given Artifact version. You can distinguish between remote and non-remote assets using the remote attribute of each asset, so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from comet_ml import Experiment

experiment = Experiment(
    api_key="<Your API Key>",
    project_name="<Your Project Name>"
)
logged_artifact = experiment.get_artifact(
    "artifact-name",
    WORKSPACE,
)

# Access asset information without downloading them
for asset in logged_artifact.assets:
    if asset.remote:
        print(asset.link)
    else:
        print(asset.logical_path)
        print(asset.size)
    print(asset.metadata)
    print(asset.asset_type)
    print(asset.id)
    print(asset.artifact_version_id)
    print(asset.artifact_id)

Update an Artifact version

Artifact version are immutable, once an Artifact is logged it can no longer be updated so as to maintain accurate lineage. You can however create a new Artifact version using a previous version as a starting point.

Here is how you can retrieve an existing Artifact version, add a new file, compute the new version and log it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from comet_ml import Experiment

experiment = Experiment(
    api_key="<Your API Key>",
    project_name="<Your Project Name>"
)

logged_artifact = experiment.get_artifact("artifact-name", WORKSPACE)

local_artifact = logged_artifact.download("/data/input")

local_artifact.add("./new-file")
local_artifact.version = logged_artifact.version.next_minor()

experiment.log_artifact(local_artifact)

Learn more

Dec. 19, 2023