Set crawl priorities

You can use the priority parameter to control which pages FetchFox visits during a crawl, and in which order it visits them. The priority parameter has four fields, each of which can be a list of URL patterns.

only can be used to specifically whitelist certain URL patterns. If this field is defined, then FetchFox will only visit URLs that match at least one of those URL patterns.
skip can be used to specifically blacklist certain URL patterns. If this field is defined, then those URLs matching any of the patterns in the list will be skipped. They will be skipped even if they match another pattern in priority definition.
high can be used to mark certain URL patterns as high priority. If this field is defined, then FetchFox will prioritize visiting URLs that match any of the patterns in the list.
low can be used to mark certain URL patterns as low priority. If this field is defined, then FetchFox will place low priority on visiting URLs that match any of the patterns in the list.

As an example, suppose you want to visit only URLs matching in the shopping category, but you want to skip jeans and pants. You also want to place higher priority on shirts, and lower priority on socks. You can use priority to define all of these preferences, as shown in the example below.

curl -X POST https://api.fetchfox.ai/api/crawl \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FETCHFOX_API_KEY" \
-d '{
    "pattern":"https://pokemondb.net/pokedex/*",
    "priority": {
      "only": [
        "https://example.com/shopping/*"
      ],
      "skip": [
        "https://example.com/shopping/jeans/*",
        "https://example.com/shopping/pants/*"
      ],
      "high": [
        "https://example.com/shopping/shirts/*"
      ],
      "low": [
        "https://example.com/shopping/socks/*"
      ]
    },
    "max_visits": 50
}'

Defining priorities is useful if you are targetting only a small section of a large site, or if you want to avoid wasting time on certain parts of a site. Keep in mind that priority controls which pages FetchFox visits. The parameter doese not affect which URLs are found in results.hits.

Guides

Scrape

Crawl

Extract

Set crawl priorities