Using URL patterns
A basic crawl takes just thepattern parameter as input. This parameter is a URL pattern.
A URL pattern is a string that may contain some * and ** operators. Both of these are wildcards, but they operate in slightly different ways.
*matches any character except/**matches any character including/
-
Pattern:
https://example.com/a/* -
Pattern:
https://example.com/a/**
results.hits section contains all the matching URLs.
In the example above, you’ll notice some unwanted URLs, like https://pokemondb.net/pokedex/game/legends-arceus. We were looking only for URLs matching Pokemon characters, not games. This is a good time to use the * operator, which does not match slashes. Our updated call looks like this: