How Secrets Lurking in Source Code Lead to Major Breaches

Major Data Breaches

If one word could sum up the 2021 infosecurity year (well,
actually three), it would be these: “supply chain attack”.

A software supply chain attack happens when hackers manipulate
the code in third-party software components to compromise the
‘downstream’ applications that use them. In 2021, we have seen a
dramatic rise in such attacks: high profile security incidents like
the SolarWinds, Kaseya, and Codecov[1]
data breaches have shaken enterprise’s confidence in the security
practices of third-party service providers.

What does this have to do with secrets, you might ask? In short,
a lot. Take the Codecov case (we’ll go back to it quickly): it is a
textbook example to illustrate how hackers leverage hardcoded
credentials to gain initial access into their victims’ systems and
harvest more secrets down the chain.

Secrets-in-code remains one of the most overlooked
vulnerabilities in the application security space, despite being a
priority target in hackers’ playbooks. In this article, we will
talk about secrets and how keeping them out of source code is
today’s number one priority to secure the software development
lifecycle.

What is a secret?

Secrets are digital authentication credentials (API keys,
certificates, tokens, etc.) that are used in applications, services
or infrastructures. Much like a password (plus a device in case of
2FA) is used to authenticate a person, a secret authenticates
systems to enable interoperability. But there is a catch: unlike
passwords, secrets are meant to be distributed.

To continually deliver new features, software engineering teams
need to interconnect more and more building blocks. Organizations
are watching the number of credentials in use across multiple teams
(development squad, SRE, DevOps, security etc.) explode. Sometimes
developers will keep keys in an insecure location to make it easier
to change the code, but doing so often results in the information
mistakenly being forgotten and inadvertently published.

In the application security landscape, hardcoded secrets are
really a different type of vulnerability. First, since source code
is a very leaky asset, meant to be cloned, checked out, and forked
on multiple machines very frequently, secrets are leaky too. But,
more worryingly, let’s not forget that code also has a memory.

Any codebase is managed with some kind of version control system
(VCS), keeping a historical timeline of all the modifications ever
made to it, sometimes over decades. The problem is that still-valid
secrets can be hiding anywhere on this timeline, opening a new
dimension to the attack surface. Unfortunately, most security
analyses are only done on the current, ready-to-be-deployed, state
of a codebase. In other words, when it comes to credentials living
in an old commit or even a never-deployed branch, these tools are
totally blind.

Six million secrets pushed to GitHub

Last year, monitoring the commits pushed to GitHub in real-time,
GitGuardian detected more than 6 million leaked
secrets
[2], doubling the number
from 2020. On average, 3 commits out of 1,000 contained a
credential, which is fifty percent higher than last year.

A large share of those secrets was giving access to corporate
resources. No wonder then that an attacker looking to gain a
foothold into an enterprise system would first look at its public
repositories on GitHub, and then at the ones owned by its
employees. Many developers use GitHub for personal projects and can
happen to leak by mistake corporate credentials (yes, it happens
regularly!).

With valid corporate credentials, attackers operate as
authorized users, and detecting abuse becomes difficult. The time
for a credential to be compromised after being pushed to GitHub is
a mere 4 seconds, meaning it should be immediately revoked and
rotated to neutralize the risk of being breached. Out of guilt, or
lacking technical knowledge, we can see why people often take
the wrong path[3]
to get out of this situation.

Another bad mistake for enterprises would be to tolerate the
presence of secrets inside non-public repositories. GitGuardian’s
State of Secrets Sprawl report highlights the fact that private
repositories hide much more secrets than their public equivalent.
The hypothesis here is that private repositories give the owners a
false sense of security, making them a bit less concerned about
potential secrets lurking in the codebase.

That’s ignoring the fact that these forgotten secrets could
someday have a devastating impact if harvested by hackers.

To be fair, application security teams are well aware of the
problem. But the amount of work to be done to investigate, revoke
and rotate the secrets committed every week, or dig through years
of uncharted territory, is simply overwhelming.

Headline breaches… and the rest

However, there is an urgency. Hackers are actively looking for
“dorks” on GitHub, which are easily recognized patterns to identify
leaked secrets. And GitHub is not the only place where they can be
active, any registry (like Docker Hub) or any source code leak can
potentially become a goldmine to find exploitation vectors.

As evidence, you just have to look at recently disclosed
breaches: a favorite of many open-source projects, Codecov is a
code coverage tool. Last year, it was compromised by attackers who
gained access by extracting a static cloud account credential from
its official Docker image. After having successfully accessed the
official source code repository, they were able to tamper with a CI
script and harvest hundreds of secrets from Codecov’s user
base.

More recently, Twitch’s entire codebase was leaked, exposing
more than 6,000 Git repositories and 3 million documents. Despite
lots of evidence demonstrating a certain level of AppSec maturity,
nearly 7,000 secrets could be
surfaced
[4]! We are talking about
hundreds of AWS, Google, Stripe, and GitHub keys. Just a few of
them would be enough to deploy a full-scale attack on the company’s
most critical systems. This time no customer data was leaked, but
that’s mostly luck.

A few years ago, Uber was not so lucky. An employee accidentally
published some corporate code on a public GitHub repository, that
was his own. Hackers found out and detected a cloud service
provider’s keys granting access to Uber’s infrastructure. A massive
breach ensued.

The bottom line is that you can’t really be sure when a secret
will be exploited, but what you must be aware of is that malicious
actors are monitoring your developers, and they are looking for
your code. Also keep in mind that these incidents are just the tip
of the iceberg, and that probably many more breaches involving
secrets are not publicly disclosed.

Conclusion

Secrets are a core component of any software stack, and they are
especially powerful, therefore they require very strong protection.
Their distributed nature and the modern software development
practices make it very hard to control where they end up, be it
source code, production logs, Docker images, or instant messaging
apps. Secrets detection and remediation capability is a must
because even secrets can be exploited in an attack leading to a
major breach. Such scenarios happen every week and as more and more
services and infrastructure are used in the enterprise world, the
number of leaks is growing at a very fast rate. The earlier action
is taken, the easier it is to protect source code from future
threats.

Note – This article is written by Thomas Segura,
technical content writer at GitGuardian. Thomas has worked as both
an analyst and software engineer consultant for various big French
companies.

References

  1. ^
    Codecov
    (blog.gitguardian.com)
  2. ^
    detected
    more than 6 million leaked secrets

    (blog.gitguardian.com)
  3. ^
    the
    wrong path
    (docs.gitguardian.com)
  4. ^
    nearly
    7,000 secrets could be surfaced

    (blog.gitguardian.com)

Read more

Leave a Reply