Automatically pick a proxy

Many sites require a proxy to access. FetchFox lets you specify which proxy to use for any operation using the proxy parameter, but it can become difficult and time consuming to know which proxy to use for which site. Instead, you can ask FetchFox to automatically suggest a proxy to use for any given site.

The auto proxy parameter

All of our endpoints (including scrape, crawl, and extract) let you pass in auto as a value for the proxy parameter. This tells FetchFox to automatically pick the most appropriate proxy on a per domain basis. Below is an example of how to call the crawl endpoint using the automatic proxy picker.

curl -X POST https://api.fetchfox.ai/api/crawl \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FETCHFOX_API_KEY" \
-d '{
    "pattern":"https://pokemondb.net/pokedex/*",
    "proxy": "auto"
}'

When FetchFox sees auto as the proxy value, it will check its historical data for the domain it is visiting. If there is historical data, then it will use that to pick the appropriate proxy. If there isn’t, then FetchFox will run some trials to generate initial data on that domain. Thus, the first time you make a call with auto for particular domain, it may be a little slower than usual.

Suggest a proxy

You can also use our standalone endpoint to pick a proxy. It has one required parameter, which is url. This is the URL you are trying to access. Below is an example of how to call this endpoint.

curl -X POST https://api.fetchfox.ai/api/proxy/suggest \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FETCHFOX_API_KEY" \
-d '{"url":"https://pokemondb.net/pokedex/national"}'

The response this this call will look something like this:

{
  "job_id": "g461eql806",
  "results": {
    "best": {
      "name": "datacenter",
      "costPerGb": 0.01
    },
    "stats": {
      "datacenter": {
        "name": "datacenter",
        "total": 10,
        "success": 10,
        "error": 0,
        "captcha": 0,
        "login": 8,
        "runtime": 9951.2
      },
      ... more historical stats ...
    }
  },
  "metrics": { ... cost and usage metrics ... }
}

The response contains a suggested proxy in results.best. This proxy is the cheapeset one that we found to reliably access the target domain. We also return some historical statistics on historical visits to that domain, which you will see in results.stats. This shows how many requests from each proxy succeeded or failed. Note that the suggest endpoint will incur cost the first time it runs for any domain. This is because FetchFox needs to visit a sampling of URLs on the target domain, and evaluate the results using AI.

How it works

The first time you call suggest for any domain, FetchFox visits a sampling of URLs on that domain, and checks the results using AI. We ask the AI if it noticed any errors, captchas, or other bot blocks on the page for each proxy. These results are recorded in our historical tracking database. Once we have some historical data, FetchFox picks the cheapest proxy that can reliably access a site. Reliable access means that most visits to that domain with a given proxy succeed.

Historical data and forced runs

If FetchFox has enough historical data, then calls to the suggest endpoint do not have to make any new visits. These calls will be much faster and cheaper. However, if you want to force FetchFox to collect new data, use the force_run and count parameters. These will tell FetchFox to make count new visits to the target domain using each proxy, regardless of historical data.

Guides

Scrape

Crawl

Extract

Automatically pick a proxy

The auto proxy parameter

Suggest a proxy

How it works

Historical data and forced runs

Guides

Scrape

Crawl

Extract

​The auto proxy parameter

​Suggest a proxy

​How it works

​Historical data and forced runs

The auto proxy parameter

Suggest a proxy

How it works

Historical data and forced runs