Cloud Resources
Cloud storage misconfigurations are one of the easiest wins in modern engagements. Companies create S3 buckets / Azure blobs / GCS buckets for legitimate reasons, then mis-set the access policy and put production data in something the entire internet can read. The goal of this stage: find the buckets the organization owns, before touching them, so you can hand the customer a clean “your bucket X is publicly readable” finding.
# 1. DNS-resident clues - cloud storage hostnames in the organization's DNSgrep -E '(amazonaws|blob.core.windows|storage.googleapis)' subdomains.txt
# 2. Google dorks - find indexed bucket URLs"site:s3.amazonaws.com inlanefreight""site:blob.core.windows.net inlanefreight"
# 3. Source-code crawl - buckets referenced in JS/CSS/HTML of the main sitecurl -s https://target.com | grep -Eo 'https?://[^"]*\.(amazonaws\.com|blob\.core\.windows\.net|storage\.googleapis\.com)[^"]*'
# 4. Third-party indexes - public buckets indexed by GrayHatWarfare and friends# https://buckets.grayhatwarfare.com/ (search by company name or abbreviation)Success indicator: a list of bucket URLs owned by the target, classified by access level (public, partial, restricted) and content category (static assets, backups, customer data, credentials).
Why cloud storage matters
Section titled “Why cloud storage matters”Every major cloud provider exposes object storage that defaults to private but can be made public with a single config flag. The flag exists for a reason - static websites, public downloads, file sharing - but it gets set incorrectly all the time. A non-exhaustive list of what’s been found in public buckets across real engagements:
- Customer PII (names, emails, addresses, SSNs)
- Backups of production databases
- API keys and cloud credentials (which compound the breach)
- Private SSH keys (immediate lateral movement onto everything those keys authorize)
- Source code with hardcoded secrets
- Internal documents, contracts, sales pipelines
- CI/CD artifacts including private container images
A single publicly readable backup bucket is often the entire engagement.
Identifying cloud storage from DNS
Section titled “Identifying cloud storage from DNS”Cloud storage gets named via DNS for the same reason any other service does - humans need to find it. Look at your subdomain list from Domains & Subdomains for cloud-provider hostnames:
# After resolving subdomains to IPs (or to CNAMEs):host -t cname assets.inlanefreight.comassets.inlanefreight.com is an alias for s3-website-us-west-2.amazonaws.com.Or check the resolved IPs against known cloud ranges:
# AWS publishes its IP ranges as JSONcurl -s https://ip-ranges.amazonaws.com/ip-ranges.json | jq .When you find a *.amazonaws.com alias, the actual bucket name often follows a predictable pattern. s3-website-us-west-2.amazonaws.com is the regional endpoint; the bucket is usually accessed at one of:
<bucket>.s3.amazonaws.com<bucket>.s3-us-west-2.amazonaws.com<bucket>.s3-website-us-west-2.amazonaws.com
If assets.inlanefreight.com aliases to the S3 website endpoint, the bucket is very likely named assets.inlanefreight.com (the alias works because the bucket name matches the host header).
Provider naming conventions
Section titled “Provider naming conventions”| Provider | Storage type | Hostname pattern |
|---|---|---|
| AWS | S3 | <bucket>.s3.amazonaws.com, <bucket>.s3-<region>.amazonaws.com |
| AWS | S3 website | <bucket>.s3-website-<region>.amazonaws.com |
| Azure | Blob storage | <account>.blob.core.windows.net/<container>/ |
| Azure | File storage | <account>.file.core.windows.net/<share>/ |
| GCP | Cloud Storage | storage.googleapis.com/<bucket>/, <bucket>.storage.googleapis.com |
| DigitalOcean | Spaces | <space>.<region>.digitaloceanspaces.com |
| Backblaze | B2 | <bucket>.s3.<region>.backblazeb2.com (S3-compatible) |
| Linode | Object Storage | <bucket>.<region>.linodeobjects.com |
Useful pattern: companies tend to name buckets after either the application, the environment, or the data. So you might see:
inlanefreight-assetsinlanefreight-backupsinlanefreight-prodinlanefreight-stagingilfreight-prodilf-dataWordlist bucket discovery is covered later in this page.
Google dorks
Section titled “Google dorks”Google has indexed massive amounts of bucket content. Cloud provider URLs follow recognizable patterns; you can search for them with site: operators and narrow by content keywords.
AWS S3
Section titled “AWS S3”site:s3.amazonaws.com inlanefreightsite:s3-us-west-2.amazonaws.com "inlanefreight" filetype:pdfinurl:s3.amazonaws.com "inlanefreight" intext:"confidential"Azure Blob
Section titled “Azure Blob”site:blob.core.windows.net "inlanefreight"inurl:blob.core.windows.net filetype:xlsx "inlanefreight"site:storage.googleapis.com "inlanefreight"inurl:storage.googleapis.com "inlanefreight" filetype:jsonGeneral cloud bucket dorking
Section titled “General cloud bucket dorking”"AKIA" filetype:txt # AWS access key prefix in text files"AccountKey=" filetype:config # Azure storage account keys"google_application_credentials" filetype:jsonReal engagements turn up PDFs (contracts, proposals), spreadsheets (customer lists, invoices), text dumps, JSON configs, presentations, and source code in indexed buckets. The PII surface is often immediate and severe.
Source-code references
Section titled “Source-code references”Cloud storage URLs appear in HTML, CSS, and JavaScript of the company’s primary website. Images, fonts, scripts, and downloadable assets are loaded from buckets - that’s why the bucket exists in the first place.
# Pull the main page and extract cloud storage referencescurl -sL https://target.com \ | grep -Eo 'https?://[^"]*\.(amazonaws\.com|blob\.core\.windows\.net|storage\.googleapis\.com|digitaloceanspaces\.com)[^"]*' \ | sort -uhttps://cdn-images.inlanefreight.com.s3.amazonaws.com/banner.pnghttps://inlanefreight-static.s3-us-west-2.amazonaws.com/main.csshttps://inlanefreight-docs.blob.core.windows.net/public/whitepaper.pdfEach unique hostname is a bucket worth investigating. The full URL is what’s referenced; the bucket itself often has additional content beyond what’s linked in HTML.
For comprehensive crawling, use a tool like waybackurls or gau to pull URLs the Wayback Machine has archived for the domain - historical references often include staging buckets that aren’t linked anymore.
waybackurls inlanefreight.com | grep -E 'amazonaws|blob|googleapis' | sort -uGrayHatWarfare
Section titled “GrayHatWarfare”GrayHatWarfare maintains a continuously updated index of public S3, Azure Blob, and GCS buckets along with their file contents. Search by:
- Bucket name (partial match)
- Filename / extension
- Content type
This service has cataloged hundreds of thousands of misconfigured buckets. Search for the company name, then for common abbreviations (companies often abbreviate in internal naming - InlaneFreight → ilf → if). Search for likely file types:
inlanefreight → matches bucket namesilf → matches abbreviationsinlanefreight .sql → backup dumpsinlanefreight .pem → private keysinlanefreight id_rsa → SSH keysinlanefreight .env → environment configsCommon high-value finds:
| Search | Why |
|---|---|
.sql.gz .sql.bz2 .dump | Database backups |
id_rsa *.pem *.ppk | Private keys |
.env config.json credentials.json | App credentials |
*.bak backup.tar.gz | Generic backups |
*.kdbx secrets.txt | Password managers / secret stores |
A single SSH key in a public bucket can mean instant root on production servers. Verify the bucket truly belongs to the target before reporting - bucket names sometimes collide.
Bucket-name brute-force
Section titled “Bucket-name brute-force”Even when no DNS pointer or source-code link reveals a bucket, you can guess names by enumerating common patterns. Tools that automate this:
- s3scanner - checks bucket existence + access level for a wordlist of names
- cloud_enum - covers AWS, Azure, and GCP with one wordlist
- Bucket Stream - passive discovery via certificate transparency monitoring
# cloud_enum examplepython cloud_enum.py -k inlanefreight -k ilfreight -k ilf -k ifThe tool generates permutations: inlanefreight-prod, prod-inlanefreight, inlanefreight-backup, inlanefreight.us-west-2, etc. - and checks each against AWS, Azure, and GCP simultaneously.
Wordlist patterns that work
Section titled “Wordlist patterns that work”{company}-prod {company}-staging {company}-dev{company}-backup {company}-backups {company}-bak{company}-data {company}-public {company}-assets{company}-uploads {company}-files {company}-images{company}-logs {company}-archive {company}-temp{company}.{env} {env}.{company} {company}{env}Append regions for AWS: -us-east-1, -us-west-2, -eu-west-1, etc.
domain.glass
Section titled “domain.glass”domain.glass is a quick infrastructure aggregator. Useful for spotting cloud usage at a glance - it shows DNS, TLS, hosting, CDN classification, and a Cloudflare “Safe” / “Suspicious” rating in one view. The Cloudflare verdict is a good cue for layer-2 gateway notes: if Cloudflare protects the main site, you need to think about WAF bypass on later stages.
What to do with a bucket you find
Section titled “What to do with a bucket you find”For each bucket you discover that belongs to the target:
- List contents -
aws s3 ls s3://bucket --no-sign-requestfor AWS;az storage blob list --account-name X --container-name Y --auth-mode loginfor Azure (or browser navigation to the public URL). - Classify access - public read, public read+write, requester-pays, fully private. The access policy is the finding.
- Sample content - what type of data is in there? Static website assets? Customer files? Backups?
- Note credentials within content - config files,
.env, scripts often contain hardcoded keys. - Don’t exfiltrate - confirm the bucket is open, document the access level, take screenshots, don’t pull gigabytes of customer data unless explicitly authorized.
The finding writes itself: “Bucket inlanefreight-backups is publicly readable and contains [N] database backup files spanning [date range]. Recommended remediation: set bucket ACL to private, audit access logs.”
SSH keys and credentials in cloud buckets
Section titled “SSH keys and credentials in cloud buckets”A specific high-value finding worth calling out: SSH private keys in public buckets is a recurring pattern. Engineers under pressure store id_rsa somewhere “temporary” for a quick transfer; the bucket gets made public for a different file; the key sits there indefinitely.
If you find an SSH private key in a bucket:
- Identify what host it authorizes - try
ssh -i found_key user@target_host - Document the public key fingerprint and which hosts accept it
- Report immediately - this is a high-severity finding regardless of what else you do
Similarly for cloud credentials (AKIA... access keys, Azure storage account keys, GCP service account JSONs). Don’t use the credentials - verify they’re valid via aws sts get-caller-identity or equivalent, then report.
Putting it together
Section titled “Putting it together”A clean cloud-recon workflow:
1. Filter subdomain list → identify cloud-hosted hostnames from DNS data2. Crawl primary site → extract bucket URLs from HTML/JS/CSS3. Wayback archive → historical bucket references4. Google dorks → indexed bucket content5. GrayHatWarfare → public bucket index, by company name + abbreviations6. cloud_enum brute → wordlist-based bucket discovery7. Per bucket: access test → public/restricted classification8. Per bucket: content sample → categorize, identify high-value content9. Per bucket: secret scan → keys, credentials, PII within bucket contentsThe output: a per-bucket inventory with access level, content category, and any embedded credentials. Buckets with credentials or PII become immediate findings; buckets that just host static assets get noted for completeness.