Inadvertently or otherwise, developers using hardcoded passwords or SSH keys for testing purposes could forget to remove them: invitation for trouble…

Hard-coded secrets include any type of sensitive information, such as usernames, passwords, SSH keys, and access tokens. They can be easily leaked to an attacker if an application’s source or configuration contains them.

Even if hard-coding secrets is not a practice, a developer could still insert such a block of code for local testing purposes but forget to remove it later. When this Java source code is compiled, the resulting executable JAR file will contain the username and password strings.

Wherever this executable is placed, the strings can be scraped if a hacker gets access to their underlying storage. The hacker can then simply log into a system using these easily obtained leaked passwords or access tokens.

Scope of the problem

Secrets leak often. Here are just a few recently reported large-scale incidents.

    • On 2 August 2022, it was revealed that Twitter API keys were leaked through 3,207 mobile apps. This type of leak allows an attacker to access various categories of sensitive information, including direct messages sent between Twitter users through associated apps.
    • On 1 September 2022, Symantec declared that 1,859 apps (iOS and Android) contained AWS tokens, 77% of which allowed access to private AWS cloud services, and 47% of which allowed access to files, often in the millions, in S3 buckets.
    • On 15 September 2022, Toyota noticed it had inadvertently leaked keys in source code that it had uploaded to GitHub, which then allowed an attacker to access 296,019 customer records with email addresses.
    • One of the most high-profile data breaches was Target, which had 40m of its shoppers’ credit and debit card records stolen. As a result, at least five lawsuits were filed seeking millions of dollars in damages across several states, and Target’s sales dropped by up to 4% compared to the prior year period. When the dust finally settled few years later, Target had ended up spending US$202m in legal fees and damages, according to The New York Times.

If steps and mechanisms had been put into place to detect embedded secrets to scan developers’ code within the IDE, during pull or merge requests, or in nightly scans—the leaks would likely have been caught. Doing this earlier in DevOps workflows helps to ensure that secrets are not pushed downstream and helps to reduce the cost and delays associated with remediation.

Types of hard-coded secrets
Understanding the three types of hard-coded secret can help developers increase awareness of the need to weed them out.

    1. Exposed passwords
      Explicit password strings within source code in variable assignments, string comparisons, or for any type of manipulation in code will cause them to leak in the final executable or app. Passwords in infrastructure-as-code configuration files, scripts, and other locations are also security risks because the underlying storage can potentially be accessed by a determined attacker, through vulnerabilities in open source components, unsecured source code, or other methods.
    2. SSH keys
      SSH keys are widely used to authenticate users to systems (and systems to users) via public key cryptography. The broader SSH protocol has strong encryption to protect traffic and is commonly used on the internet for system access and file transfer. Leaking admin SSH keys could mean allowing an attacker full access to that system.
    3. Access tokens
      These are often used in API, HTTP, and RPC calls on the internet to authorize access to a service on behalf of a user. The token contains credentials and other pertinent information (e.g., the resources being accessed) and as such, is a secret. On the server side, if a resource or service for a user is allowed for a given token and that particular token is presented in a request, the call will succeed.

Looking for secrets

Secrets are typically introduced by individuals writing source code or configuration files, but there are many stages in the software development life cycle where they can be inadvertently added as well. For example, during a deployment build or while deploying to a staging environment for final test, an automation script may inadvertently copy a set of files that contains secrets. It is important to account for these late-stage scenarios before final deployment into production.

When, or in what context, is it optimal to detect embedded secrets? It is best to minimize the risk of pushing secrets downstream. As such, detecting secrets while working inside the IDE can both minimize the risk of publishing secrets as well as reduce the remediation effort. Late-stage detection creates more work for everyone.

Minimizing false positives

Source code often contains many hard-coded values that are used during testing but they do not make their way into the shipped product. Because such test code is not included in the shipped product, issues reported in these sections of the codebase are generally considered false positives. It is important to define the context in which the presence of a secret within source code may not manifest a risk in production. For example, security, DevOps, and engineering teams can configure Rapid Scan Static to explicitly include or exclude specific files or directories for scanning.

To increase accuracy, Rapid Scan Static does not rely only on regex pattern matching to detect hard-coded secrets. Certain secrets require semantic understanding of variables, values, and other context in source language and configuration files in order that they are detected without false exclusion from analysis.

One way to avoid false positives is to eliminate the up-front work of specifying any regex pattern or configuration. This simplifies secrets detection and eliminates potential points of failure due to misconfiguration or deviation from established patterns. Conversely, regex-only solutions can be noisy when pattern matching is the only technique.