Extract data from URLs
The extract endpoint takes URLs and outputs structured data
FetchFox’s extract endpoint takes a list of URLs and item template as input, and returns structured data following the item template. The item template can be a dictionary of key value pairs, or a string. You can extract a one item per URL, or many many items per URL.
A simple extraction
The extract endpoint has two required parameters:
urls
specifies which URLs to target for data extraction.template
specifies the output format for each item.
The template can be a dictionary or a string.
If template is a dictionary, the keys in the dictionary define the keys of each output item. The values in your template dictionary should describe what data you want in that field. The entire template dictionary will be passed to the AI, which will use it to do data extraction, so you can include helpful information like how to get the data, the format for that field, and so on.
If the template is a string, FetchFox will use that string to automatically determine the keys in the output items.
Whether you template
is a dictionary or a string, FetchFox will establish a JSON schema for your output items. This schema will be returned in the artifacts
section of the response.
Below is an example of calling the extract endpoint with a template.
The response to this call will look something like this:
Extracting multiple items per URL
By default, FetchFox extracts one item for each URL you pass in. Sometimes, a page contains multiple items. You can tell FetchFox to extract all of them by setting the per_page
parameter to many
.
Below is an example of extract multliple items from a single page.
The response to this call will look something like this:
Note the divide
artifact. FetchFox when you extract multiple items per page, FetchFox uses a CSS selector to divide the page into pieces. The CSS selector it used is included as the divide
artifact, along with some AI chain of thought.