Urlgrab – A Golang Utility To Spider Through A Website Searching For Additional Links

A golang utility to spider through a website searching for additional links with support for JavaScript rendering.
The world's most advanced processor in the desktop PC gaming segment Can deliver ultra-fast 100+ FPS performance in the world's most popular games 6 cores and 12 processing threads bundled with the quiet AMD wraith stealth cooler max temps 95°C 4 2 G... read more
(as of January 23, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)
The world's most advanced processor in the desktop PC gaming segment Can deliver ultra-fast 100+ FPS performance in the world's most popular games 8 cores and 16 processing threads, bundled with the AMD Wraith Prism cooler with color controlled LED s... read more
(as of January 23, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)
The world's most advanced processor in the desktop PC gaming segment Can deliver ultra-fast 100+ FPS performance in the world's most popular games 12 cores and 24 processing threads, bundled with the AMD Wraith Prism cooler with color controlled LED ... read more
(as of January 23, 2021 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)
Install
go get -u github.com/iamstoxe/urlgrab
Features
- Customizable Parallelism
- Ability to Render JavaScript (including Single Page Applications such as Angular and React)
Usage
Usage of urlgrab:
-cache-dir string
Specify a directory to utilize caching. Works between sessions as well.
-debug
Extremely verbose debugging output. Useful mainly for development.
-delay int
Milliseconds to randomly apply as a delay between requests. (default 2000)
-depth int
The maximum limit on the recursion depth of visited URLs. (default 2)
-headless
If true the browser will be displayed while crawling.
Note: Requires render-js flag
Note: Usage to show browser: --headless=false (default true)
-ignore-query
Strip the query portion of the URL before determining if we've visited it yet.
-ignore-ssl
Scrape pages with invalid SSL certificates
-js-timeout int
The amount of seconds before a request to render javascript should timeout. (default 10)
-json string
The filename where we should store the output JSON file.
-max-body int
The limit of the retrieved response body in kilobytes.
0 means unlimited.
Supply this value in kilobytes. (i.e. 10 * 1024kb = 10MB) (default 10240)
-no-head
Do not send HEAD requests prior to GET for pre-validation.
-output-all string
The directory where we should store the output files.
-proxy string
The SOCKS5 proxy to utilize (format: socks5://127.0.0.1:8080 OR http://127.0.0.1:8080).
Supply multiple proxies by separating them with a comma.
-random-agent
Utilize a random user agent string.
-render-js
Determines if we utilize a headless chrome instance to render javascript.
-root-domain string
The root domain we should match links against.
If not specified it will default to the host of --url.
Example: --root-domain google.com
-threads int
The number of threads to utilize. (default 5)
-timeout int
The amount of seconds before a request should timeout. (default 10)
-url string
The URL where we should start crawling.
-urls string
A file path that contains a list of urls to supply as starting urls.
Requires --root-domain flag.
-user-agent string
A user agent such as (Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0).
-verbose
Verbose output
Author
Devin Stokes
- Twitter: @DevinStokes
- Github: @IAmStoxe
Download Urlgrab
If you like the site, please consider joining the telegram channel or supporting us on Patreon using the button below.