Cloud Resources

TL;DR

Cloud storage misconfigurations are one of the easiest wins in modern engagements. Companies create S3 buckets / Azure blobs / GCS buckets for legitimate reasons, then mis-set the access policy and put production data in something the entire internet can read. The goal of this stage: find the buckets the organization owns, before touching them, so you can hand the customer a clean “your bucket X is publicly readable” finding.

# 1. DNS-resident clues - cloud storage hostnames in the organization's DNS
grep -E '(amazonaws|blob.core.windows|storage.googleapis)' subdomains.txt

# 2. Google dorks - find indexed bucket URLs
"site:s3.amazonaws.com inlanefreight"
"site:blob.core.windows.net inlanefreight"

# 3. Source-code crawl - buckets referenced in JS/CSS/HTML of the main site
curl -s https://target.com | grep -Eo 'https?://[^"]*\.(amazonaws\.com|blob\.core\.windows\.net|storage\.googleapis\.com)[^"]*'

# 4. Third-party indexes - public buckets indexed by GrayHatWarfare and friends
# https://buckets.grayhatwarfare.com/  (search by company name or abbreviation)

Success indicator: a list of bucket URLs owned by the target, classified by access level (public, partial, restricted) and content category (static assets, backups, customer data, credentials).

Why cloud storage matters

Every major cloud provider exposes object storage that defaults to private but can be made public with a single config flag. The flag exists for a reason - static websites, public downloads, file sharing - but it gets set incorrectly all the time. A non-exhaustive list of what’s been found in public buckets across real engagements:

Customer PII (names, emails, addresses, SSNs)
Backups of production databases
API keys and cloud credentials (which compound the breach)
Private SSH keys (immediate lateral movement onto everything those keys authorize)
Source code with hardcoded secrets
Internal documents, contracts, sales pipelines
CI/CD artifacts including private container images

A single publicly readable backup bucket is often the entire engagement.

Identifying cloud storage from DNS

Cloud storage gets named via DNS for the same reason any other service does - humans need to find it. Look at your subdomain list from Domains & Subdomains for cloud-provider hostnames:

# After resolving subdomains to IPs (or to CNAMEs):
host -t cname assets.inlanefreight.com

assets.inlanefreight.com is an alias for s3-website-us-west-2.amazonaws.com.

Or check the resolved IPs against known cloud ranges:

# AWS publishes its IP ranges as JSON
curl -s https://ip-ranges.amazonaws.com/ip-ranges.json | jq .

When you find a *.amazonaws.com alias, the actual bucket name often follows a predictable pattern. s3-website-us-west-2.amazonaws.com is the regional endpoint; the bucket is usually accessed at one of:

<bucket>.s3.amazonaws.com
<bucket>.s3-us-west-2.amazonaws.com
<bucket>.s3-website-us-west-2.amazonaws.com

If assets.inlanefreight.com aliases to the S3 website endpoint, the bucket is very likely named assets.inlanefreight.com (the alias works because the bucket name matches the host header).

Provider naming conventions

Provider	Storage type	Hostname pattern
AWS	S3	`<bucket>.s3.amazonaws.com`, `<bucket>.s3-<region>.amazonaws.com`
AWS	S3 website	`<bucket>.s3-website-<region>.amazonaws.com`
Azure	Blob storage	`<account>.blob.core.windows.net/<container>/`
Azure	File storage	`<account>.file.core.windows.net/<share>/`
GCP	Cloud Storage	`storage.googleapis.com/<bucket>/`, `<bucket>.storage.googleapis.com`
DigitalOcean	Spaces	`<space>.<region>.digitaloceanspaces.com`
Backblaze	B2	`<bucket>.s3.<region>.backblazeb2.com` (S3-compatible)
Linode	Object Storage	`<bucket>.<region>.linodeobjects.com`

Useful pattern: companies tend to name buckets after either the application, the environment, or the data. So you might see:

inlanefreight-assets
inlanefreight-backups
inlanefreight-prod
inlanefreight-staging
ilfreight-prod
ilf-data

Wordlist bucket discovery is covered later in this page.

Google dorks

Google has indexed massive amounts of bucket content. Cloud provider URLs follow recognizable patterns; you can search for them with site: operators and narrow by content keywords.

AWS S3

site:s3.amazonaws.com inlanefreight
site:s3-us-west-2.amazonaws.com "inlanefreight" filetype:pdf
inurl:s3.amazonaws.com "inlanefreight" intext:"confidential"

Azure Blob

site:blob.core.windows.net "inlanefreight"
inurl:blob.core.windows.net filetype:xlsx "inlanefreight"

GCP

site:storage.googleapis.com "inlanefreight"
inurl:storage.googleapis.com "inlanefreight" filetype:json

General cloud bucket dorking

"AKIA" filetype:txt              # AWS access key prefix in text files
"AccountKey=" filetype:config    # Azure storage account keys
"google_application_credentials" filetype:json

Real engagements turn up PDFs (contracts, proposals), spreadsheets (customer lists, invoices), text dumps, JSON configs, presentations, and source code in indexed buckets. The PII surface is often immediate and severe.

Source-code references

Cloud storage URLs appear in HTML, CSS, and JavaScript of the company’s primary website. Images, fonts, scripts, and downloadable assets are loaded from buckets - that’s why the bucket exists in the first place.

# Pull the main page and extract cloud storage references
curl -sL https://target.com \
  | grep -Eo 'https?://[^"]*\.(amazonaws\.com|blob\.core\.windows\.net|storage\.googleapis\.com|digitaloceanspaces\.com)[^"]*' \
  | sort -u

https://cdn-images.inlanefreight.com.s3.amazonaws.com/banner.png
https://inlanefreight-static.s3-us-west-2.amazonaws.com/main.css
https://inlanefreight-docs.blob.core.windows.net/public/whitepaper.pdf

Each unique hostname is a bucket worth investigating. The full URL is what’s referenced; the bucket itself often has additional content beyond what’s linked in HTML.

For comprehensive crawling, use a tool like waybackurls or gau to pull URLs the Wayback Machine has archived for the domain - historical references often include staging buckets that aren’t linked anymore.

waybackurls inlanefreight.com | grep -E 'amazonaws|blob|googleapis' | sort -u

GrayHatWarfare

GrayHatWarfare maintains a continuously updated index of public S3, Azure Blob, and GCS buckets along with their file contents. Search by:

Bucket name (partial match)
Filename / extension
Content type

This service has cataloged hundreds of thousands of misconfigured buckets. Search for the company name, then for common abbreviations (companies often abbreviate in internal naming - InlaneFreight → ilf → if). Search for likely file types:

inlanefreight        → matches bucket names
ilf                  → matches abbreviations
inlanefreight .sql   → backup dumps
inlanefreight .pem   → private keys
inlanefreight id_rsa → SSH keys
inlanefreight .env   → environment configs

Common high-value finds:

Search	Why
`.sql.gz` `.sql.bz2` `.dump`	Database backups
`id_rsa` `.pem` `.ppk`	Private keys
`.env` `config.json` `credentials.json`	App credentials
`*.bak` `backup.tar.gz`	Generic backups
`*.kdbx` `secrets.txt`	Password managers / secret stores

A single SSH key in a public bucket can mean instant root on production servers. Verify the bucket truly belongs to the target before reporting - bucket names sometimes collide.

Bucket-name brute-force

Even when no DNS pointer or source-code link reveals a bucket, you can guess names by enumerating common patterns. Tools that automate this:

s3scanner - checks bucket existence + access level for a wordlist of names
cloud_enum - covers AWS, Azure, and GCP with one wordlist
Bucket Stream - passive discovery via certificate transparency monitoring

# cloud_enum example
python cloud_enum.py -k inlanefreight -k ilfreight -k ilf -k if

The tool generates permutations: inlanefreight-prod, prod-inlanefreight, inlanefreight-backup, inlanefreight.us-west-2, etc. - and checks each against AWS, Azure, and GCP simultaneously.

Wordlist patterns that work

{company}-prod        {company}-staging      {company}-dev
{company}-backup      {company}-backups      {company}-bak
{company}-data        {company}-public       {company}-assets
{company}-uploads     {company}-files        {company}-images
{company}-logs        {company}-archive      {company}-temp
{company}.{env}       {env}.{company}        {company}{env}

Append regions for AWS: -us-east-1, -us-west-2, -eu-west-1, etc.

domain.glass

domain.glass is a quick infrastructure aggregator. Useful for spotting cloud usage at a glance - it shows DNS, TLS, hosting, CDN classification, and a Cloudflare “Safe” / “Suspicious” rating in one view. The Cloudflare verdict is a good cue for layer-2 gateway notes: if Cloudflare protects the main site, you need to think about WAF bypass on later stages.

What to do with a bucket you find

For each bucket you discover that belongs to the target:

List contents - aws s3 ls s3://bucket --no-sign-request for AWS; az storage blob list --account-name X --container-name Y --auth-mode login for Azure (or browser navigation to the public URL).
Classify access - public read, public read+write, requester-pays, fully private. The access policy is the finding.
Sample content - what type of data is in there? Static website assets? Customer files? Backups?
Note credentials within content - config files, .env, scripts often contain hardcoded keys.
Don’t exfiltrate - confirm the bucket is open, document the access level, take screenshots, don’t pull gigabytes of customer data unless explicitly authorized.

The finding writes itself: “Bucket inlanefreight-backups is publicly readable and contains [N] database backup files spanning [date range]. Recommended remediation: set bucket ACL to private, audit access logs.”

SSH keys and credentials in cloud buckets

A specific high-value finding worth calling out: SSH private keys in public buckets is a recurring pattern. Engineers under pressure store id_rsa somewhere “temporary” for a quick transfer; the bucket gets made public for a different file; the key sits there indefinitely.

If you find an SSH private key in a bucket:

Identify what host it authorizes - try ssh -i found_key user@target_host
Document the public key fingerprint and which hosts accept it
Report immediately - this is a high-severity finding regardless of what else you do

Similarly for cloud credentials (AKIA... access keys, Azure storage account keys, GCP service account JSONs). Don’t use the credentials - verify they’re valid via aws sts get-caller-identity or equivalent, then report.

Putting it together

A clean cloud-recon workflow:

1. Filter subdomain list   → identify cloud-hosted hostnames from DNS data
2. Crawl primary site      → extract bucket URLs from HTML/JS/CSS
3. Wayback archive         → historical bucket references
4. Google dorks            → indexed bucket content
5. GrayHatWarfare          → public bucket index, by company name + abbreviations
6. cloud_enum brute        → wordlist-based bucket discovery
7. Per bucket: access test → public/restricted classification
8. Per bucket: content sample → categorize, identify high-value content
9. Per bucket: secret scan → keys, credentials, PII within bucket contents

The output: a per-bucket inventory with access level, content category, and any embedded credentials. Buckets with credentials or PII become immediate findings; buckets that just host static assets get noted for completeness.

Defenses D3-NTA Network Traffic Analysis D3-FA File Analysis