External Web Reconnaissance

External web recon is the first stage of web security testing: identify assets, technologies, and exposure before deeper testing.

info

Only perform reconnaissance on assets you are authorized to test.

Recon Types

Active Recon: Direct interaction with the target (scans/probes). Faster, but noisy.
Passive Recon: Public data collection (CT logs, search engines, datasets). Lower risk of detection.

Vhosts vs Subdomains

Feature	Vhosts	Subdomains
Definition	Multiple sites on one server	Separate labels under a parent domain
Config layer	Web server config	DNS records
Example	`site1.example.com`, `site2.example.com`	`blog.example.com`, `shop.example.com`

Passive Reconnaissance

Subdomain Enumeration

Certificate Transparency / APIs

# crt.sh
domain="target.com"
curl -s "https://crt.sh/?q=%25.$domain&output=json" \
	| jq -r '.[] | .name_value' \
	| sed 's/*\.//g' | sort -u > crtsh.txt

# hackertarget
curl -s "https://api.hackertarget.com/hostsearch/?q=$domain" \
	| cut -d, -f1 | sort -u > hackertarget.txt

# certspotter
curl -s "https://api.certspotter.com/v1/issuances?domain=$domain&include_subdomains=true&expand=dns_names" \
	| jq -r '.[].dns_names[]' | sed 's/*\.//g' | sort -u > certspotter.txt

Other passive sources

# AlienVault OTX
host="target.com"
curl -s "https://otx.alienvault.com/api/v1/indicators/domain/$host/url_list?limit=100&page=1" \
	| grep -o '"hostname": *"[^"]*' | sed 's/"hostname": "//' | sort -u > alienvault.txt

# subdomain.center
curl -s "https://api.subdomain.center/?domain=$host" | jq -r '.[]' | sort -u > subcenter.txt

Tooling

subfinder -d target.com -silent -all -recursive -o subfinder.txt
assetfinder --subs-only target.com > assetfinder.txt
sublist3r -d target.com -o sublist3r.txt
amass enum --passive -d target.com -o amass.txt
findomain --target target.com --output

Merge and deduplicate

cat *.txt | sort -u > subdomains.txt

Filter alive hosts

dnsx -l subdomains.txt -silent > alive.txt

Resolve IP addresses for alive hosts

while read -r subdomain; do
	host "$subdomain" | awk '/has address/ {print $NF}' >> ip_addresses.txt
done < alive.txt

# Alternative
dnsx -l alive.txt -o ip_addresses.txt

Port scan resolved IPs

# Nmap
nmap -iL ip_addresses.txt -p 1-65535 -T4 -oN nmap_results.txt

# masscan
masscan -iL ip_addresses.txt -p1-65535 -oX masscan_results.xml

# naabu
naabu -iL ip_addresses.txt -o naabu_results.txt

Identify web services with `httpx`

cat alive.txt | httpx -p 80,443,8080 -silent -o web_services.txt
cat alive.txt | httpx -p 80,443,8080 -status-code -title -tech-detect -silent

Screenshotting

# gowitness
gowitness scan file -f web_services.txt --write-db
gowitness report server

# eyewitness
eyewitness --web -f web_services.txt -d eyewitness_output_target

Branch Determination and Separation

You can branch targets for note-taking/visualization with unhttpx + rmm:

# Canvas output
cat web_services.txt | unhttpx | rmm -o obsidian > target.canvas

# Markdown output
cat web_services.txt | unhttpx | rmm -o markdown > target.md

Post-Scan Workflow (After Nmap/Naabu/Masscan/Gowitness)

Review open services and versions.
Check outdated software and known CVEs.
Identify misconfigurations/default settings.
Gather usernames/credentials/artifacts during recon.
Test auth surfaces (web login, FTP, MySQL, MSSQL, SMTP where applicable).
Enumerate hidden URLs/endpoints (gau, crawlers, archival sources).
Test 401/403 bypasses and restricted paths.
Check cloud storage exposures (e.g., S3 references).
Hunt leaked API keys in JS/git artifacts.
Validate OWASP Top 10 classes and API-specific issues.
Fuzz parameters and input fields.
Verify CRUD authorization and business logic paths.
Understand app architecture flow (frontend ↔ API ↔ backend/database).

tip

Track findings per host/service in separate notes to avoid losing context during large-scope recon.

JS File Discovery Tools

JSParser
Katana
LinkFinder
GetJS
InputScanner
JSScan
Manual source analysis

Crawling

# Basic crawl
echo https://target.com | katana

# Crawl and extract JS links
cat web_services.txt | katana -silent -jc | grep '.js' > js_files_links.txt

Google Dorking Cheatsheet (Quick Set)

site:example.com
inurl:admin
intitle:"index of"
filetype:bak
filetype:log
inurl:"phpinfo"
inurl:admin OR inurl:login
site:example.com -inurl:"admin"
"internal error" site:example.com

Useful dork collections:

Recon Types​

Vhosts vs Subdomains​

Passive Reconnaissance​

Subdomain Enumeration​

Certificate Transparency / APIs​

Other passive sources​

Tooling​

Merge and deduplicate​

Filter alive hosts​

Resolve IP addresses for alive hosts​

Port scan resolved IPs​

Identify web services with httpx​

Screenshotting​

Branch Determination and Separation​

Post-Scan Workflow (After Nmap/Naabu/Masscan/Gowitness)​

JS File Discovery Tools​

Crawling​

Google Dorking Cheatsheet (Quick Set)​