- Automatic: You can let FetchFox determine the starting URLs for you.
- Explicit: You can tell FetchFox which URLs to start crawling from.
Automatically determine the starting URLs
If you do not passstartUrls, FetchFox generates a small seed set from your pattern:
- The origin (for example
https://example.com) - Path prefixes derived from your pattern
Explicitly setting the starting URLs
UsestartUrls to explicitly define the seed URLs for a crawl.
Setting startUrls is helpful for crawling specific parts of a large site. It is especially useful with maxDepth, which limits the maximum depth of a crawl.
For example, suppose you are scraping commits on specific repos on GitHub. You can pass target repos in startUrls, then set maxDepth: 0.