Skip to main content

External Web Reconnaissance

External web recon is the first stage of web security testing: identify assets, technologies, and exposure before deeper testing.

info

Only perform reconnaissance on assets you are authorized to test.

Recon Types

  • Active Recon: Direct interaction with the target (scans/probes). Faster, but noisy.
  • Passive Recon: Public data collection (CT logs, search engines, datasets). Lower risk of detection.

Vhosts vs Subdomains

FeatureVhostsSubdomains
DefinitionMultiple sites on one serverSeparate labels under a parent domain
Config layerWeb server configDNS records
Examplesite1.example.com, site2.example.comblog.example.com, shop.example.com

Passive Reconnaissance

Subdomain Enumeration

Certificate Transparency / APIs

# crt.sh
domain="target.com"
curl -s "https://crt.sh/?q=%25.$domain&output=json" \
| jq -r '.[] | .name_value' \
| sed 's/*\.//g' | sort -u > crtsh.txt

# hackertarget
curl -s "https://api.hackertarget.com/hostsearch/?q=$domain" \
| cut -d, -f1 | sort -u > hackertarget.txt

# certspotter
curl -s "https://api.certspotter.com/v1/issuances?domain=$domain&include_subdomains=true&expand=dns_names" \
| jq -r '.[].dns_names[]' | sed 's/*\.//g' | sort -u > certspotter.txt

Other passive sources

# AlienVault OTX
host="target.com"
curl -s "https://otx.alienvault.com/api/v1/indicators/domain/$host/url_list?limit=100&page=1" \
| grep -o '"hostname": *"[^"]*' | sed 's/"hostname": "//' | sort -u > alienvault.txt

# subdomain.center
curl -s "https://api.subdomain.center/?domain=$host" | jq -r '.[]' | sort -u > subcenter.txt

Tooling

subfinder -d target.com -silent -all -recursive -o subfinder.txt
assetfinder --subs-only target.com > assetfinder.txt
sublist3r -d target.com -o sublist3r.txt
amass enum --passive -d target.com -o amass.txt
findomain --target target.com --output

Merge and deduplicate

cat *.txt | sort -u > subdomains.txt

Filter alive hosts

dnsx -l subdomains.txt -silent > alive.txt

Resolve IP addresses for alive hosts

while read -r subdomain; do
host "$subdomain" | awk '/has address/ {print $NF}' >> ip_addresses.txt
done < alive.txt

# Alternative
dnsx -l alive.txt -o ip_addresses.txt

Port scan resolved IPs

# Nmap
nmap -iL ip_addresses.txt -p 1-65535 -T4 -oN nmap_results.txt

# masscan
masscan -iL ip_addresses.txt -p1-65535 -oX masscan_results.xml

# naabu
naabu -iL ip_addresses.txt -o naabu_results.txt

Identify web services with httpx

cat alive.txt | httpx -p 80,443,8080 -silent -o web_services.txt
cat alive.txt | httpx -p 80,443,8080 -status-code -title -tech-detect -silent

Screenshotting

# gowitness
gowitness scan file -f web_services.txt --write-db
gowitness report server

# eyewitness
eyewitness --web -f web_services.txt -d eyewitness_output_target

Branch Determination and Separation

You can branch targets for note-taking/visualization with unhttpx + rmm:

# Canvas output
cat web_services.txt | unhttpx | rmm -o obsidian > target.canvas

# Markdown output
cat web_services.txt | unhttpx | rmm -o markdown > target.md

Post-Scan Workflow (After Nmap/Naabu/Masscan/Gowitness)

  1. Review open services and versions.
  2. Check outdated software and known CVEs.
  3. Identify misconfigurations/default settings.
  4. Gather usernames/credentials/artifacts during recon.
  5. Test auth surfaces (web login, FTP, MySQL, MSSQL, SMTP where applicable).
  6. Enumerate hidden URLs/endpoints (gau, crawlers, archival sources).
  7. Test 401/403 bypasses and restricted paths.
  8. Check cloud storage exposures (e.g., S3 references).
  9. Hunt leaked API keys in JS/git artifacts.
  10. Validate OWASP Top 10 classes and API-specific issues.
  11. Fuzz parameters and input fields.
  12. Verify CRUD authorization and business logic paths.
  13. Understand app architecture flow (frontend ↔ API ↔ backend/database).
tip

Track findings per host/service in separate notes to avoid losing context during large-scope recon.

JS File Discovery Tools

  • JSParser
  • Katana
  • LinkFinder
  • GetJS
  • InputScanner
  • JSScan
  • Manual source analysis

Crawling

# Basic crawl
echo https://target.com | katana

# Crawl and extract JS links
cat web_services.txt | katana -silent -jc | grep '.js' > js_files_links.txt

Google Dorking Cheatsheet (Quick Set)

site:example.com
inurl:admin
intitle:"index of"
filetype:bak
filetype:log
inurl:"phpinfo"
inurl:admin OR inurl:login
site:example.com -inurl:"admin"
"internal error" site:example.com

Useful dork collections:

  1. https://www.exploit-db.com/google-hacking-database
  2. https://github.com/Ishanoshada/GDorks