Hakrawler – Simple, Fast Web Crawler Designed For Easy, Quick Discovery Of Endpoints And Assets Within A Web Application

hakrawler is a Go web crawler designed for easy, quick discovery of endpoints and assets within a web application. It can be used to discover:

  • Forms
  • Endpoints
  • Subdomains
  • Related domains
  • JavaScript files

The goal is to create the tool in a way that it can be easily chained with other tools such as subdomain enumeration tools and vulnerability scanners in order to facilitate tool chaining, for example:

assetfinder target.com | hakrawler | some-xss-scanner

Features

  • Unlimited, fast web crawling for endpoint discovery
  • Fuzzy matching for domain discovery
  • robots.txt parsing
  • sitemap.xml parsing
  • Plain output for easy parsing into other tools
  • Accept domains from stdin for easier tool chaining
  • SQLMap-friendly output format
  • Link gathering from JavaScript files

Upcoming features

  • Cleaner code
  • Want more? Submit a feature request!

Contributors

  • hakluke wrote the tool
  • cablej cleaned up the code
  • Corben Leo added in functionality to pull links from JavaScript files

Thanks

  • codingo and prodigysml/sml555, my favourite people to hack with. A constant source of ideas and inspiration. They also provided beta testing and a sounding board for this tool in development.
  • tomnomnom who wrote waybackurls, which powers the wayback part of this tool
  • s0md3v who wrote photon, which I took ideas from to create this tool
  • The folks from gocolly, the library which powers the crawler engine
  • oxffaa, who wrote a very efficient sitemap.xml parser which is used in this tool
  • The contributors of LinkFinder where some awesome regex was stolen to parse links from JavaScript files.

Installation

  1. Install Golang
  2. Run the command below
go get github.com/hakluke/hakrawler
  1. Run hakrawler from your Go bin directory. For linux systems it will likely be:
~/go/bin/hakrawler

Note that if you need to do this, you probably want to add your Go bin directory to your $PATH to make things easier!

Usage
Note: multiple domains can be crawled by piping them into hakrawler from stdin. If only a single domain is being crawled, it can be added by using the -domain flag.

$ hakrawler -h
Usage of hakrawler:
-all
Include everything in output - this is the default, so this option is superfluous (default true)
-auth string
The value of this will be included as a Authorization header
-cookie string
The value of this will be included as a Cookie header
-depth int
Maximum depth to crawl, the default is 1. Anything above 1 will include URLs from robots, sitemap, waybackurls and the initial crawler as a seed. Higher numbers take longer but yield more results. (default 1)
-domain string
The domain that you wish to crawl (for example, google.com)
-forms
Include form actions in output
-js
Include links to utilised JavaScript files
-outdir string
Directory to save discovered raw HTTP requests
-plain
Don't use colours or print the banners to allow for easier parsing
-robots
Include robots.txt entries in output
-schema string
Schema, http or https (default "http")
-scope string
Scope to include:
strict = specified domain only
subs = specified domain and subdomains
fuzzy = anything containing the supplied domain
yolo = everything (default "subs")
-sitemap
Include sitemap.xml entries in output
-subs
Include subdomains in output
-urls
Include URLs in output
-usewayback
Query wayback machine for URLs and add them as seeds for the crawler
-wayback
Include wayback machine entries in output
-linkfinder
Search all JavaScript files for more links. Note that these will not be complete links, only relative. Parsing full links from JavaScript is too resource intensive.

Basic Example

Image:
Command: hakrawler -domain bugcrowd.com -depth 1

Full text output:

   $ hakrawler -domain bugcrowd.com -depth 1

██╗ ██╗ █████╗ ██╗ ██╗██████╗ █████╗ ██╗ ██╗██╗ ███████╗██████╗
██║ ██║██╔══██╗██║ ██╔╝██╔══██╗██╔══██╗██║ ██║██║ ██╔════╝██╔══██╗
███████║██ ████║█████╔╝ ██████╔╝███████║██║ █╗ ██║██║ █████╗ ██████╔╝
██╔══██║██╔══██║██╔═██╗ ██╔══██╗██╔══██║██║███╗██║██║ ██╔══╝ ██╔══██╗
██║ ██║██║ ██║██║ ██╗█ ║ ██║██║ ██║╚███╔███╔╝███████╗███████╗██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚══╝╚══╝ ╚══════╝╚══════╝╚═╝ ╚═╝
Crafted with <3 by hakluke
[robots] http://bugcrowd.com/*?preview
[sitemap] https://bugcrowd.com/
[sitemap] https://bugcrowd.com/contact/
[sitemap] https://bugcrowd.com/faq/
[sitemap] https://b ugcrowd.com/leaderboard/
[sitemap] https://bugcrowd.com/list-of-bug-bounty-programs/
[sitemap] https://bugcrowd.com/press/
[sitemap] https://bugcrowd.com/pricing/
[sitemap] https://bugcrowd.com/privacy/
[sitemap] https://bugcrowd.com/terms/
[sitemap] https://bugcrowd.com/resources/responsible-disclosure-program/
[sitemap] https://bugcrowd.com/resources/why-care-about-web-security/
[sitemap] https://bugcrowd.com/resources/what-is-a-bug-bounty/
[sitemap] https://bugcrowd.com/stories/movember/
[sitemap] https://bugcrowd.com/stories/riskio/
[sitemap] https://bugcrowd.com/stories/tagged/
[sitemap] https://bugcrowd.com/tour/
[sitemap] https://bugcrowd.com/tour/platform/
[sitemap] https://bugcrowd.com/tour/crowd/
[sitemap] https://bugcrowd.com/customers/programs/new
[sitemap] https://bugcrowd.com/portal/
[sitemap] https://bugcrowd.com/portal/user/sign_in/
[sitemap] https://bugcrowd.com/portal/user/sign_up/
[url] ht tps://bugcrowd.com/user/sign_in
[subdomain] bugcrowd.com
[url] https://tracker.bugcrowd.com/user/sign_in
[subdomain] tracker.bugcrowd.com
[url] https://www.bugcrowd.com/
[subdomain] www.bugcrowd.com
[url] https://www.bugcrowd.com/products/how-it-works/
[url] https://www.bugcrowd.com/products/how-it-works/the-bugcrowd-difference/
[url] https://www.bugcrowd.com/products/platform/
[url] https://www.bugcrowd.com/products/platform/integrations/
[url] https://www.bugcrowd.com/products/platform/vulnerability-rating-taxonomy/
[url] https://www.bugcrowd.com/products/attack-surface-management/
[url] https://www.bugcrowd.com/products/bug-bounty/
[url] https://www.bugcrowd.com/products/vulnerability-disclosure/
[url] https://www.bugcrowd.com/products/next-gen-pen-test/
[url] https://www.bugcrowd.com/products/bug-bash/
[url] https://www.bugcrowd.com/resources/reports/priority-one-report
[url] https://www.bugcrowd.com/solutions/< br/>[url] https://www.bugcrowd.com/solutions/financial-services/
[url] https://www.bugcrowd.com/solutions/healthcare/
[url] https://www.bugcrowd.com/solutions/retail/
[url] https://www.bugcrowd.com/solutions/automotive-security/
[url] https://www.bugcrowd.com/solutions/technology/
[url] https://www.bugcrowd.com/solutions/government/
[url] https://www.bugcrowd.com/solutions/security/
[url] https://www.bugcrowd.com/solutions/marketplace-apps/
[url] https://www.bugcrowd.com/customers/
[url] https://www.bugcrowd.com/hackers/
[url] https://bugcrowd.com/programs
[url] https://bugcrowd.com/crowdstream
[url] https://www.bugcrowd.com/bug-bounty-list/
[url] https://www.bugcrowd.com/hackers/faqs/
[url] https://www.bugcrowd.com/resources/help-wanted/
[url] https://www.bugcrowd.com/hackers/bugcrowd-university/
[url] https://www.bugcrowd.com/hackers/ambassador-program/
[url] https://forum.bugcrowd.com
[subdomain] forum.bugcro wd.com
[url] https://bugcrowd.com/leaderboard
[url] https://www.bugcrowd.com/resources/levelup-0x04
[url] https://www.bugcrowd.com/resources/
[url] https://www.bugcrowd.com/resources/webinars/
[url] https://www.bugcrowd.com/resources/bakers-dozen/
[url] https://www.bugcrowd.com/events/
[url] https://www.bugcrowd.com/resources/glossary/
[url] https://www.bugcrowd.com/resources/faqs/
[url] https://www.bugcrowd.com/about/
[url] https://www.bugcrowd.com/blog
[url] https://www.bugcrowd.com/about/expertise/
[url] https://www.bugcrowd.com/about/leadership/
[url] https://www.bugcrowd.com/about/press-releases/
[url] https://www.bugcrowd.com/about/careers/
[url] https://www.bugcrowd.com/partners/
[url] https://www.bugcrowd.com/about/news/
[url] https://www.bugcrowd.com/about/contact/
[url] https://bugcrowd.com/user/sign_up
[url] https://www.bugcrowd.com/get-started/
[url] https://www.bugcrowd.com/products/attack-s urface-management
[url] https://www.bugcrowd.com/products/bug-bounty
[url] https://www.bugcrowd.com/customers/motorola
[url] https://www.bugcrowd.com/products/vulnerability-disclosure
[url] https://www.bugcrowd.com/products/next-gen-pen-test
[url] https://www.bugcrowd.com/resources/guides/esg-research-ciso-security-trends
[url] https://www.bugcrowd.com/events/join-us-at-rsa-2019-march-4-8-2019-san-francisco/
[url] https://www.bugcrowd.com/resources/4-reasons-to-swap-your-traditional-pen-test-with-a-next-gen-pen-test/
[url] https://www.bugcrowd.com/blog/november-2019-hall-of-fame/
[url] https://www.bugcrowd.com/blog/bugcrowd-launches-crowdstream-and-in-platform-coordinated-disclosure/
[url] https://www.bugcrowd.com/blog/the-future-is-now-2020-cybersecurity-predictions/
[url] https://www.bugcrowd.com/press-release/bugcrowd-launches-first-crowd-driven-approach-to-risk-based-asset-discovery-and-prioritization/
[url] https://www.bugcrowd.co m/press-release/bugcrowd-university-expands-education-and-training-for-whitehat-hackers/
[url] https://www.bugcrowd.com/press-release/bugcrowd-announces-industrys-first-platform-enabled-cybersecurity-assessments-for-marketplaces/
[url] https://www.bugcrowd.com/news/
[url] https://www.bugcrowd.com/events/appsec-cali/
[url] https://www.bugcrowd.com/events
[url] https://www.bugcrowd.com/bugcrowd-security/
[url] https://www.bugcrowd.com/terms-and-conditions/
[url] https://www.bugcrowd.com/privacy/
[javascript] https://www.bugcrowd.com/wp-content/uploads/autoptimize/js/autoptimize_single_de6b8fb8b3b0a0ac96d1476a6ef0d147.js
[javascript] https://www.bugcrowd.com/wp-content/uploads/autoptimize/js/autoptimize_79a2bb0d9a869da52bd3e98a65b0cfb7.js
Download Hakrawler
Original Source