External Web Reconnaissance
External web recon is the first stage of web security testing: identify assets, technologies, and exposure before deeper testing.
info
Only perform reconnaissance on assets you are authorized to test.
Recon Types
- Active Recon: Direct interaction with the target (scans/probes). Faster, but noisy.
- Passive Recon: Public data collection (CT logs, search engines, datasets). Lower risk of detection.
Vhosts vs Subdomains
| Feature | Vhosts | Subdomains |
|---|---|---|
| Definition | Multiple sites on one server | Separate labels under a parent domain |
| Config layer | Web server config | DNS records |
| Example | site1.example.com, site2.example.com | blog.example.com, shop.example.com |
Passive Reconnaissance
Subdomain Enumeration
Certificate Transparency / APIs
# crt.sh
domain="target.com"
curl -s "https://crt.sh/?q=%25.$domain&output=json" \
| jq -r '.[] | .name_value' \
| sed 's/*\.//g' | sort -u > crtsh.txt
# hackertarget
curl -s "https://api.hackertarget.com/hostsearch/?q=$domain" \
| cut -d, -f1 | sort -u > hackertarget.txt
# certspotter
curl -s "https://api.certspotter.com/v1/issuances?domain=$domain&include_subdomains=true&expand=dns_names" \
| jq -r '.[].dns_names[]' | sed 's/*\.//g' | sort -u > certspotter.txt
Other passive sources
# AlienVault OTX
host="target.com"
curl -s "https://otx.alienvault.com/api/v1/indicators/domain/$host/url_list?limit=100&page=1" \
| grep -o '"hostname": *"[^"]*' | sed 's/"hostname": "//' | sort -u > alienvault.txt
# subdomain.center
curl -s "https://api.subdomain.center/?domain=$host" | jq -r '.[]' | sort -u > subcenter.txt
Tooling
subfinder -d target.com -silent -all -recursive -o subfinder.txt
assetfinder --subs-only target.com > assetfinder.txt
sublist3r -d target.com -o sublist3r.txt
amass enum --passive -d target.com -o amass.txt
findomain --target target.com --output
Merge and deduplicate
cat *.txt | sort -u > subdomains.txt
Filter alive hosts
dnsx -l subdomains.txt -silent > alive.txt
Resolve IP addresses for alive hosts
while read -r subdomain; do
host "$subdomain" | awk '/has address/ {print $NF}' >> ip_addresses.txt
done < alive.txt
# Alternative
dnsx -l alive.txt -o ip_addresses.txt
Port scan resolved IPs
# Nmap
nmap -iL ip_addresses.txt -p 1-65535 -T4 -oN nmap_results.txt
# masscan
masscan -iL ip_addresses.txt -p1-65535 -oX masscan_results.xml
# naabu
naabu -iL ip_addresses.txt -o naabu_results.txt
Identify web services with httpx
cat alive.txt | httpx -p 80,443,8080 -silent -o web_services.txt
cat alive.txt | httpx -p 80,443,8080 -status-code -title -tech-detect -silent
Screenshotting
# gowitness
gowitness scan file -f web_services.txt --write-db
gowitness report server
# eyewitness
eyewitness --web -f web_services.txt -d eyewitness_output_target
Branch Determination and Separation
You can branch targets for note-taking/visualization with unhttpx + rmm:
# Canvas output
cat web_services.txt | unhttpx | rmm -o obsidian > target.canvas
# Markdown output
cat web_services.txt | unhttpx | rmm -o markdown > target.md
Post-Scan Workflow (After Nmap/Naabu/Masscan/Gowitness)
- Review open services and versions.
- Check outdated software and known CVEs.
- Identify misconfigurations/default settings.
- Gather usernames/credentials/artifacts during recon.
- Test auth surfaces (web login, FTP, MySQL, MSSQL, SMTP where applicable).
- Enumerate hidden URLs/endpoints (
gau, crawlers, archival sources). - Test 401/403 bypasses and restricted paths.
- Check cloud storage exposures (e.g., S3 references).
- Hunt leaked API keys in JS/git artifacts.
- Validate OWASP Top 10 classes and API-specific issues.
- Fuzz parameters and input fields.
- Verify CRUD authorization and business logic paths.
- Understand app architecture flow (frontend ↔ API ↔ backend/database).
tip
Track findings per host/service in separate notes to avoid losing context during large-scope recon.
JS File Discovery Tools
- JSParser
- Katana
- LinkFinder
- GetJS
- InputScanner
- JSScan
- Manual source analysis
Crawling
# Basic crawl
echo https://target.com | katana
# Crawl and extract JS links
cat web_services.txt | katana -silent -jc | grep '.js' > js_files_links.txt
Google Dorking Cheatsheet (Quick Set)
site:example.com
inurl:admin
intitle:"index of"
filetype:bak
filetype:log
inurl:"phpinfo"
inurl:admin OR inurl:login
site:example.com -inurl:"admin"
"internal error" site:example.com
Useful dork collections: