When you crawl for URLs using a pattern, FetchFox needs a set of URLs to start at. Those URLs can be set in one of two ways:Documentation Index
Fetch the complete documentation index at: https://docs.fetchfox.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Automatic: You can let FetchFox determine the starting URLs for you.
- Explicit: You can tell FetchFox which URLs to start crawling from.
Automatically determine the starting URLs
If you do not passstartUrls, FetchFox generates a small seed set from your pattern:
- The origin (for example
https://example.com) - Path prefixes derived from your pattern
Explicitly setting the starting URLs
UsestartUrls to explicitly define the seed URLs for a crawl.
Setting startUrls is helpful for crawling specific parts of a large site. It is especially useful with maxDepth, which limits the maximum depth of a crawl.
For example, suppose you are scraping commits on specific repos on GitHub. You can pass target repos in startUrls, then set maxDepth: 0.