Skip to content

Cloud Resources

Cloud storage misconfigurations are one of the easiest wins in modern engagements. Companies create S3 buckets / Azure blobs / GCS buckets for legitimate reasons, then mis-set the access policy and put production data in something the entire internet can read. The goal of this stage: find the buckets the organization owns, before touching them, so you can hand the customer a clean “your bucket X is publicly readable” finding.

# 1. DNS-resident clues - cloud storage hostnames in the organization's DNS
grep -E '(amazonaws|blob.core.windows|storage.googleapis)' subdomains.txt
# 2. Google dorks - find indexed bucket URLs
"site:s3.amazonaws.com inlanefreight"
"site:blob.core.windows.net inlanefreight"
# 3. Source-code crawl - buckets referenced in JS/CSS/HTML of the main site
curl -s https://target.com | grep -Eo 'https?://[^"]*\.(amazonaws\.com|blob\.core\.windows\.net|storage\.googleapis\.com)[^"]*'
# 4. Third-party indexes - public buckets indexed by GrayHatWarfare and friends
# https://buckets.grayhatwarfare.com/ (search by company name or abbreviation)

Success indicator: a list of bucket URLs owned by the target, classified by access level (public, partial, restricted) and content category (static assets, backups, customer data, credentials).

Every major cloud provider exposes object storage that defaults to private but can be made public with a single config flag. The flag exists for a reason - static websites, public downloads, file sharing - but it gets set incorrectly all the time. A non-exhaustive list of what’s been found in public buckets across real engagements:

  • Customer PII (names, emails, addresses, SSNs)
  • Backups of production databases
  • API keys and cloud credentials (which compound the breach)
  • Private SSH keys (immediate lateral movement onto everything those keys authorize)
  • Source code with hardcoded secrets
  • Internal documents, contracts, sales pipelines
  • CI/CD artifacts including private container images

A single publicly readable backup bucket is often the entire engagement.

Cloud storage gets named via DNS for the same reason any other service does - humans need to find it. Look at your subdomain list from Domains & Subdomains for cloud-provider hostnames:

Terminal window
# After resolving subdomains to IPs (or to CNAMEs):
host -t cname assets.inlanefreight.com
assets.inlanefreight.com is an alias for s3-website-us-west-2.amazonaws.com.

Or check the resolved IPs against known cloud ranges:

Terminal window
# AWS publishes its IP ranges as JSON
curl -s https://ip-ranges.amazonaws.com/ip-ranges.json | jq .

When you find a *.amazonaws.com alias, the actual bucket name often follows a predictable pattern. s3-website-us-west-2.amazonaws.com is the regional endpoint; the bucket is usually accessed at one of:

  • <bucket>.s3.amazonaws.com
  • <bucket>.s3-us-west-2.amazonaws.com
  • <bucket>.s3-website-us-west-2.amazonaws.com

If assets.inlanefreight.com aliases to the S3 website endpoint, the bucket is very likely named assets.inlanefreight.com (the alias works because the bucket name matches the host header).

ProviderStorage typeHostname pattern
AWSS3<bucket>.s3.amazonaws.com, <bucket>.s3-<region>.amazonaws.com
AWSS3 website<bucket>.s3-website-<region>.amazonaws.com
AzureBlob storage<account>.blob.core.windows.net/<container>/
AzureFile storage<account>.file.core.windows.net/<share>/
GCPCloud Storagestorage.googleapis.com/<bucket>/, <bucket>.storage.googleapis.com
DigitalOceanSpaces<space>.<region>.digitaloceanspaces.com
BackblazeB2<bucket>.s3.<region>.backblazeb2.com (S3-compatible)
LinodeObject Storage<bucket>.<region>.linodeobjects.com

Useful pattern: companies tend to name buckets after either the application, the environment, or the data. So you might see:

inlanefreight-assets
inlanefreight-backups
inlanefreight-prod
inlanefreight-staging
ilfreight-prod
ilf-data

Wordlist bucket discovery is covered later in this page.

Google has indexed massive amounts of bucket content. Cloud provider URLs follow recognizable patterns; you can search for them with site: operators and narrow by content keywords.

site:s3.amazonaws.com inlanefreight
site:s3-us-west-2.amazonaws.com "inlanefreight" filetype:pdf
inurl:s3.amazonaws.com "inlanefreight" intext:"confidential"
site:blob.core.windows.net "inlanefreight"
inurl:blob.core.windows.net filetype:xlsx "inlanefreight"
site:storage.googleapis.com "inlanefreight"
inurl:storage.googleapis.com "inlanefreight" filetype:json
"AKIA" filetype:txt # AWS access key prefix in text files
"AccountKey=" filetype:config # Azure storage account keys
"google_application_credentials" filetype:json

Real engagements turn up PDFs (contracts, proposals), spreadsheets (customer lists, invoices), text dumps, JSON configs, presentations, and source code in indexed buckets. The PII surface is often immediate and severe.

Cloud storage URLs appear in HTML, CSS, and JavaScript of the company’s primary website. Images, fonts, scripts, and downloadable assets are loaded from buckets - that’s why the bucket exists in the first place.

Terminal window
# Pull the main page and extract cloud storage references
curl -sL https://target.com \
| grep -Eo 'https?://[^"]*\.(amazonaws\.com|blob\.core\.windows\.net|storage\.googleapis\.com|digitaloceanspaces\.com)[^"]*' \
| sort -u
https://cdn-images.inlanefreight.com.s3.amazonaws.com/banner.png
https://inlanefreight-static.s3-us-west-2.amazonaws.com/main.css
https://inlanefreight-docs.blob.core.windows.net/public/whitepaper.pdf

Each unique hostname is a bucket worth investigating. The full URL is what’s referenced; the bucket itself often has additional content beyond what’s linked in HTML.

For comprehensive crawling, use a tool like waybackurls or gau to pull URLs the Wayback Machine has archived for the domain - historical references often include staging buckets that aren’t linked anymore.

Terminal window
waybackurls inlanefreight.com | grep -E 'amazonaws|blob|googleapis' | sort -u

GrayHatWarfare maintains a continuously updated index of public S3, Azure Blob, and GCS buckets along with their file contents. Search by:

  • Bucket name (partial match)
  • Filename / extension
  • Content type

This service has cataloged hundreds of thousands of misconfigured buckets. Search for the company name, then for common abbreviations (companies often abbreviate in internal naming - InlaneFreightilfif). Search for likely file types:

inlanefreight → matches bucket names
ilf → matches abbreviations
inlanefreight .sql → backup dumps
inlanefreight .pem → private keys
inlanefreight id_rsa → SSH keys
inlanefreight .env → environment configs

Common high-value finds:

SearchWhy
.sql.gz .sql.bz2 .dumpDatabase backups
id_rsa *.pem *.ppkPrivate keys
.env config.json credentials.jsonApp credentials
*.bak backup.tar.gzGeneric backups
*.kdbx secrets.txtPassword managers / secret stores

A single SSH key in a public bucket can mean instant root on production servers. Verify the bucket truly belongs to the target before reporting - bucket names sometimes collide.

Even when no DNS pointer or source-code link reveals a bucket, you can guess names by enumerating common patterns. Tools that automate this:

  • s3scanner - checks bucket existence + access level for a wordlist of names
  • cloud_enum - covers AWS, Azure, and GCP with one wordlist
  • Bucket Stream - passive discovery via certificate transparency monitoring
Terminal window
# cloud_enum example
python cloud_enum.py -k inlanefreight -k ilfreight -k ilf -k if

The tool generates permutations: inlanefreight-prod, prod-inlanefreight, inlanefreight-backup, inlanefreight.us-west-2, etc. - and checks each against AWS, Azure, and GCP simultaneously.

{company}-prod {company}-staging {company}-dev
{company}-backup {company}-backups {company}-bak
{company}-data {company}-public {company}-assets
{company}-uploads {company}-files {company}-images
{company}-logs {company}-archive {company}-temp
{company}.{env} {env}.{company} {company}{env}

Append regions for AWS: -us-east-1, -us-west-2, -eu-west-1, etc.

domain.glass is a quick infrastructure aggregator. Useful for spotting cloud usage at a glance - it shows DNS, TLS, hosting, CDN classification, and a Cloudflare “Safe” / “Suspicious” rating in one view. The Cloudflare verdict is a good cue for layer-2 gateway notes: if Cloudflare protects the main site, you need to think about WAF bypass on later stages.

For each bucket you discover that belongs to the target:

  1. List contents - aws s3 ls s3://bucket --no-sign-request for AWS; az storage blob list --account-name X --container-name Y --auth-mode login for Azure (or browser navigation to the public URL).
  2. Classify access - public read, public read+write, requester-pays, fully private. The access policy is the finding.
  3. Sample content - what type of data is in there? Static website assets? Customer files? Backups?
  4. Note credentials within content - config files, .env, scripts often contain hardcoded keys.
  5. Don’t exfiltrate - confirm the bucket is open, document the access level, take screenshots, don’t pull gigabytes of customer data unless explicitly authorized.

The finding writes itself: “Bucket inlanefreight-backups is publicly readable and contains [N] database backup files spanning [date range]. Recommended remediation: set bucket ACL to private, audit access logs.”

A specific high-value finding worth calling out: SSH private keys in public buckets is a recurring pattern. Engineers under pressure store id_rsa somewhere “temporary” for a quick transfer; the bucket gets made public for a different file; the key sits there indefinitely.

If you find an SSH private key in a bucket:

  1. Identify what host it authorizes - try ssh -i found_key user@target_host
  2. Document the public key fingerprint and which hosts accept it
  3. Report immediately - this is a high-severity finding regardless of what else you do

Similarly for cloud credentials (AKIA... access keys, Azure storage account keys, GCP service account JSONs). Don’t use the credentials - verify they’re valid via aws sts get-caller-identity or equivalent, then report.

A clean cloud-recon workflow:

1. Filter subdomain list → identify cloud-hosted hostnames from DNS data
2. Crawl primary site → extract bucket URLs from HTML/JS/CSS
3. Wayback archive → historical bucket references
4. Google dorks → indexed bucket content
5. GrayHatWarfare → public bucket index, by company name + abbreviations
6. cloud_enum brute → wordlist-based bucket discovery
7. Per bucket: access test → public/restricted classification
8. Per bucket: content sample → categorize, identify high-value content
9. Per bucket: secret scan → keys, credentials, PII within bucket contents

The output: a per-bucket inventory with access level, content category, and any embedded credentials. Buckets with credentials or PII become immediate findings; buckets that just host static assets get noted for completeness.