Tuesday, August 13, 2024

ArtiPACKED: Hacking Giants Through a Race Condition in GitHub Actions Artifacts

Executive Summary

This research reviews an attack vector allowing the compromise of GitHub repositories, which not only has severe consequences in itself but could also potentially lead to high-level access to cloud environments. This is made possible through the abuse of GitHub Actions artifacts generated as part of organizations’ CI/CD workflows. A combination of misconfigurations and security flaws can make artifacts leak tokens, both of third party cloud services and GitHub tokens, making them available for anyone with read access to the repository to consume. This allows malicious actors with access to these artifacts the potential of compromising the services to which these secrets grant access. In most of the vulnerable projects we discovered during this research, the most common leakage is of GitHub tokens, allowing an attacker to act against the triggering GitHub repository. This potentially leads to the push of malicious code that can flow to production through the CI/CD pipeline, or to access secrets stored in the GitHub repository and organization.

While the research applies to both private and public GitHub repositories, this article focuses on the discovery of vulnerable public repositories. We uncover high-profile open-source projects owned by the biggest companies in the world, which before mitigation could have led to a potential impact on millions of their consumers. All of the disclosed cases were reported to the maintainers of these projects. We received great support from all teams, and were able to collaborate to mitigate all of the discoveries quickly and efficiently.

CI/CD environments, processes and systems are an essential part of modern software organizations. They’re responsible for the crucial flow of building, testing and delivering code to production. Naturally, CI/CD pipelines use highly sensitive credentials to authenticate against various types of services, creating a significant challenge to keep a high-level of credential hygiene. This article covers the potential impact of insecure usage of GitHub Actions artifacts, as well as the methods and tools to protect against this threat.

Palo Alto Networks customers are better protected from the threats discussed above through the following products:

Exploring Workflow Artifacts

Knowing how sensitive CI/CD systems are, I had to follow a hunch I had about an overlooked feature called workflow artifacts in the leading source control platform and home of many open-source projects, GitHub.

I was quite convinced I’d find sensitive data or credentials, and as it turned out, the discovery was even bigger than what I had envisioned. In fact, it impacted well-known open-source projects owned by Red Hat, Google, AWS, Canonical (Ubuntu), Microsoft, OWASP and others — and potentially reached millions of their product users.

GitHub Actions Build Artifacts

In GitHub Actions, workflow build artifacts offer a powerful mechanism for persisting and sharing data across jobs within the same workflow. These artifacts can be any files generated during your build process, such as compiled code, test reports or deployment packages.

Artifacts ensure critical data isn't lost after a workflow finishes, making the data accessible for later analysis or deployment. This is particularly useful for sharing test results or deployment packages between dependent jobs. Overall, workflow build artifacts streamline your workflows by facilitating data transfer and promoting efficient execution within the GitHub Actions environment.

The Hunch

GitHub Actions workflows frequently use secrets to interact with various cloud services and with GitHub itself. These secrets include the ephemeral, automatically created GITHUB_TOKEN used to perform actions against the repository. The Actions build artifacts are outputs generated by the execution of workflows, and once created, they’re stored for up to 90 days. In open-source projects, these artifacts are publicly available for anyone to consume.

So why not scan these artifacts for secrets?

Screenshot of a Firebase project interface showing tasks in progress related to integrating Vertex AI. The image displays a summary panel, build logs, and a progress overview with tasks at various stages of completion.
Figure 1. GitHub Actions artifact.

This approach offers a straightforward method for identifying potential security risks.

I then compiled a list of popular open-source projects on GitHub and automated the sequence of downloading their artifacts and scanning them for secrets.

Found Some Tokens, Now What?

My hunch was spot on. I found working tokens for various cloud services, including music streaming, cloud infrastructure and more. I also found something far more interesting — various GitHub tokens. Using them, though, was not straightforward.

Let's understand why and take a technical dive into the different types of tokens created by GitHub when a workflow runs.

How GitHub Tokens Find Their Way into Artifacts

Two types of GitHub tokens kept popping up: GITHUB_TOKEN, which has a prefix of ghs_, and ACTIONS_RUNTIME_TOKEN, which is a JWT (JSON Web Token).

It's important to note that these tokens weren’t part of the repository code but were only found in repository-produced artifacts. Before determining what I could do with them, I wanted to know how these tokens ended up inside artifacts in the first place.

Most GitHub users use the actions/checkout GitHub action for the obvious need of cloning their repository code for availability during the workflow run. The default behavior of actions/checkout is to persist credentials, which means the GITHUB_TOKEN is written to the local git directory, enabling it to run authenticated git commands against the repository. Most users, I’m willing to bet, aren’t aware of this default behavior and don't require the functionality. In many cases, after all, a simple clone is all that’s required for the workflow to do its job.

Screenshot of several lines of code and command lines, prominently showing GitHub repository URLs and a curl command including an authorization token. Some of the information is redacted.
Figure 2: GitHub token encoded in base64 publicly accessible and embedded in an artifact of project CycloneDX by OWASP.

From what I’ve seen, users commonly — and mistakenly — upload their entire checkout directory as an artifact. The directory contains the hidden .git folder that stores the persisted GITHUB_TOKEN, leading the publicly accessible artifacts to contain the GITHUB_TOKEN.

As seen in Figure 3, the microsoft/typescript-bot-test-triggerer project uploaded the entire checkout directory as an artifact, along with the persisted GITHUB_TOKEN stored in the .git directory.

Screenshot of a GitHub Actions workflow file named "deploy.yml". The file contains YAML code for a continuous integration process. Key elements include setting up the environment, checking out a repository, installing npm dependencies, running a build, and uploading an artifact. Specific versions for node and npm are mentioned, indicating a well-documented and structured CI pipeline.
Figure 3. Example of a Microsoft repository workflow uploading a valid GITHUB_TOKEN in an artifact.

Another mistake that had users exposing GitHub tokens in public artifacts occurred by using super-linter, a well-known open-source code linter with a widely used fork maintained by GitHub.

Once the CREATE_LOG_FILE property of super-linter is set to True, super-linter creates a log file with lots of details, including environment variables. CI/CD pipelines usually contain secrets loaded as environment variables — GitHub tokens included, meaning that logging them probably isn’t a good idea.

The super-linter log file is often uploaded as a build artifact for reasons like debuggability and maintenance. But this practice exposed sensitive tokens of the repository.

I reported this to the maintainers of super-linter, and environment variables are no longer printed to its log file. The GitHub version was also updated.

Abusing Leaked GitHub Tokens

And now, moving on to abusing these tokens.

The obvious choice would be leveraging the widely used GITHUB_TOKEN against the repository. It’s an ephemeral token created in any workflow job run and designed to allow workflows to interact with GitHub resources, like the workflow’s repository. The token can be set with limited scope and to expire on job completion, both of which will limit risk in the event of a token leakage.

During my research, though, I discovered that workflow artifacts are only available for download after the entire workflow finishes. Since the GITHUB_TOKEN expires when the job ends, I won’t be able to download the artifact and extract the token. Bummer! (Spoiler: This is just the beginning).

But I’m left with repos exposing their ACTIONS_RUNTIME_TOKEN, which is a JWT (JSON Web Token) with an expiration of about six hours according to the exp (expiration) property. ACTIONS_RUNTIME_TOKEN is an undocumented environment variable, used by several popular actions owned by GitHub, such as actions/cache and actions/upload-artifact, to manage caching and artifacts. Caching helps to speed up workflows by storing and reusing downloaded files or build results. We're already familiar with the role of artifacts.

Screen displaying a JSON code snippet with various key-value pairs including IDs, actions, system services, and dates, highlighted with color coding in shades of yellow, orange, and green against a dark background. The code is detailed with authentication and configuration parameters.
Figure 4: Decoded ACTIONS_RUNTIME_TOKEN JWT token.

By tracking a workflow run from a project that leaked a token, I could download its artifacts within the six-hour window before the token expires. Extracting the token could then be used to manage cache and artifacts.

But workflow runtimes are unpredictable unless triggered by a schedule (cron). I automated a process that downloads an artifact, extracts the ACTIONS_RUNTIME_TOKEN, and uses it to replace the artifact with a malicious one.

Subsequent workflow jobs often rely on previously uploaded artifacts. Cases of this kind open the door for remote code execution (RCE) on the runner that runs the job consuming the malicious artifact. RCE can also occur if developers download and execute a malicious artifact, leading to compromised workstations.

The video below demonstrates an attack on the SchemeCrawler project. I identified a public artifact that contains the ACTIONS_RUNTIME_TOKEN and used it to upload my own malicious artifact to replace the existing one.

Figure 5. A recorded attack on project SchemeCrawler, where I’ve injected a “malicious” artifact.

The GITHUB_TOKEN Plot Twist

Cool as it was, I craved more. There were a lot of cases where I had a leaked GITHUB_TOKEN, and I wanted to use it and push unreviewed code to the repository. But as I mentioned, these tokens were useless.

Then, with incredible timing, GitHub announced version 4 of the artifacts feature. It has impressive improvements, like 10x faster uploads. But one particular detail surprised me like an immediate call for action.

“Another common request from our users was the ability to download artifacts from the UI or API while the workflow run is in progress.”

As I read this sentence, my researcher spidey-senses tingled. It suggests that a race condition was just made possible, allowing the leaked GITHUB_TOKEN to be downloaded, extracted and used before the job finished and the token expired.

An attack flow might resemble the following:

  1. The attacker waits for a pipeline to be triggered.
  2. The repository triggers a pipeline.
  3. The pipeline inadvertently uploads an artifact that includes the GITHUB_TOKEN.
  4. Before the workflow job finishes, an attacker downloads the publicly available artifact.
  5. The attacker extracts the token from the artifact and uses it to push malicious code to the repository.
  6. The pipeline job ends, and the GITHUB_TOKEN is invalidated.
Illustration depicting a security threat scenario involving an attacker using a GITHUB_TOKEN to manipulate a pipeline from a Github repository, resulting in the unauthorized upload and download of an artifact, with subsequent invalidation of pipeline job tokens.
Figure 6: Attack flow.

Pushing Code Before the Clock Runs Out

First, I created a list of open-source projects using the upload-artifact@v4 action. The list quickly grew, especially since GitHub announced the deprecation of v3, effective November 2024. Software dependencies bots automatically create pull requests updating to v4, which accelerated this process even further. I scanned the artifacts of each of these projects for secrets and was interested in the ones exposing their GITHUB_TOKEN.

It was time for my first attempt to push code to an open-source project. To avoid harming the project, I decided that creating a branch was sufficient, as it requires write permissions, same as pushing code.

I chose a project from the list where the workflow had the contents: write permission. Spoiler alert: Most of them did, which wasn't surprising, given my previous work exploring how popular open-source projects manage their workflows’ permissions.

No luck exploiting tokens! Every time I tried to use the leaked token, it had already expired, leading to a consistent "401 Unauthorized: message: Bad Credentials" error. Usually, artifacts are uploaded as the last step of the job. The job ends right after upload is complete. Downloading and extracting the vulnerable artifact proved just slow enough for the token to expire before I could leverage it. Reviewing the workflow build logs revealed the reason it failed — a two-second delay.

I returned to my list and selected a project where the artifact upload step didn’t bring the artifact to an end but was followed by additional steps, granting me an opportunity to steal and use the token before it expired.

It worked! I was able to create a branch (write operation) in an open-source project — clair, even though as an external contributor, I obviously don't have permission to do that. I could simply push code following the same process.

Screenshot of a GitHub repository page showing a list of branches. One branch named "impala" is highlighted in red. The top of the page contains tabs like Code, Issues, Pull requests, Actions, Security, Insights.
Figure 7. Creation of branch impala in the “clair” open-source project by Red Hat.

Figure 8. Screen recording of the actual attack.

Let’s Win More Races

While I successfully exploited the issue, I wanted to broaden the attack's applicability. Previously, the attack relied on the workflow job having subsequent steps after the artifact upload, granting me a window to use the token. To improve the success rate, I applied some good old engineering to make it more robust.

Downloading the artifact to my own machine was too slow.

Needing to be closer to the target, GitHub Actions presented a perfect solution. It can be triggered remotely, run on the same cloud infrastructure as our targets, meaning lower latency and much faster downloads, plus high configurability.

I needed to further optimize performance and reduce communication time, Since artifacts are compressed, I selectively extracted only the git config file, skipping most of the archive content. Also, I sent dozens of requests per second while staying under the GitHub rate limit and disabled certificate verification.

Eventually, I came up with this design:

  1. A machine that samples the target repository and waits for a workflow_run event (like an alert) to notify me when an attack is in progress.
  2. Once a workflow was running, a malicious GitHub Actions workflow, which I named "RepoReaper," was launched.
  3. The RepoReaper workflow waits for the exact moment an artifact containing a leaked token is present.
  4. The RepoReaper workflow downloads the artifact, extracts the token and uses it to create a branch via the REST API on the target repository.
  5. Target repository compromised. It could have easily contained malicious code.

Then, I could use this design to search and target open-source projects.

Projects I’ve Helped Secure

The research laid out here allowed me to compromise dozens of projects maintained by well-known organizations, including firebase-js-sdk by Google, a JavaScript package directly referenced by 1.6 million public projects, according to GitHub. Another high-profile project involved adsys, a tool included in the Ubuntu distribution used by corporations for integration with Active Directory.

All open-source projects I approached with this issue cooperated swiftly and patched their code. Some offered bounties and cool swag. Here’s partial list of affected projects I’m allowed to disclose:

This research was reported to GitHub's bug bounty program. They categorized the issue as informational, placing the onus on users to secure their uploaded artifacts.

Stopping the Leak

My aim in this article is to highlight the potential for unintentionally exposing sensitive information through artifacts in GitHub Actions workflows. To address the concern, I developed a proof of concept (PoC) custom action that safeguards against such leaks.

The action uses the @actions/artifact package, which is also used by the upload-artifact GitHub action, adding a crucial security layer by using an open-source scanner to audit the source directory for secrets and blocking the artifact upload when risk of accidental secret exposure exists. This approach promotes a more secure workflow environment.

You can find upload-secure-artifact on the Palo Alto Networks GitHub.

Screenshot of a software build process in a Continuous Integration tool interface. The interface is primarily in dark mode with white text. The main focus is on the build steps listed sequentially from top to bottom in the center of the image. Each step is prefixed with a time stamp, reflecting its execution status, such as "Run actions/checkout@v2." Error messages are highlighted in red text, specifically on line 42 which is highlighted in red.
Figure 9. The action upload-secure-artifact failed the workflow due to the existence of a GITHUB_TOKEN in the uploaded artifact.

Conclusion

As this research shows, we have a gap in the current security conversation regarding artifact scanning. GitHub's deprecation of Artifacts V3 should prompt organizations using the artifacts mechanism to reevaluate the way they use it.

Security defenders must adopt a holistic approach, meticulously scrutinizing every stage — from code to production — for potential vulnerabilities. Overlooked elements like build artifacts often become prime targets for attackers.

Reduce workflow permissions of runner tokens according to least privilege and review artifact creation in your CI/CD pipelines. By implementing a proactive and vigilant approach to security, defenders can significantly strengthen their project's security posture.

Prisma Cloud and Other Palo Alto Networks Protection and Mitigation

Prisma Cloud detects vulnerable code that leaks the GITHUB_TOKEN within artifacts, equipping security teams to prevent attackers from using it to inject code into the repository, publishing packages or triggering pipelines, all of which could result in malicious code reaching production. The platform also offers policies to significantly reduce the potential impact of a breach — ensuring minimum permissions granted to pipelines, for example.

This image shows a digital interface titled "Pipeline uploads GITHUB_TOKEN in an artifact". The interface is divided into four main tabs: Overview, Open Events, Supported Events, and Fixed Events. The "Overview" tab is highlighted, showing a section titled "Risk Location in the Delivery Chain" with a diagram featuring a pipeline graphic marked with a number "83". Below the diagram, there are several sections with various details. The details include headings such as "Severity", "State", "Open Events", and "System Component" followed by corresponding data. There are also action buttons like "Edit" available. The interface display is part of a GitHub Actions environment.
Figure 10. Prisma Cloud detects vulnerable code that leaks the GITHUB_TOKEN within artifacts.

If you think you may have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:

  • North America Toll-Free: 866.486.4842 (866.4.UNIT42)
  • EMEA: +31.20.299.3130
  • APAC: +65.6983.8730
  • Japan: +81.50.1790.0200

Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.



from Unit 42 https://ift.tt/d2CBreV
via IFTTT

No comments:

Post a Comment