The URL generator is the fastest way to generate multiple URLs by using the patterns in the URLs. The following examples illustrate how URL parameters may vary for items like categories, search terms, and page numbers.
Categories:
- https://www.indeed.com/cmp/McDonald's
- https://www.indeed.com/cmp/FedEx
- https://www.indeed.com/cmp/Best-Buy
Search terms:
- https://www.reddit.com/search/?q=webscraping
- https://www.reddit.com/search/?q=bigdata
Pages :
- https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=1
- https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=2
- https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=3
- https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=4
How to Access URL generator
To access URL generator
- Go to Scraper -> Manage Inputs
- Click on the Generate URLs button
The following popup will appear:
Parameter types
The URL generator allows the two types of parameter values: List of Values and Range of Numbers.
- List of Values :
- The URL generator allows non-numeric values.
Example:
- https://www.reddit.com/search/?q=webscraping
- https://www.reddit.com/search/?q=bigdata
- https://www.reddit.com/search/?q=dataviz
The following example shows how to use the List of values parameter type to generate a list of URLs by varying the restaurant location.
- Click the Edit button to add the URL you want to parameterize. For instance, enter https://www.indeed.com/cmp/McDonald's in the text box.
- Select the variable portion of the URL, that is, the text that follows the equals sign and click on add parameter.
- For this example, webscraping represents the variable portion of the search term.
- Once added as parameter, webscraping changes to Parameter-1 and options for the parameter appear.
- Now that the value is a parameter, alter the parameter value to access multiple pages. This example alters the parameter to retrieve one page of Reddit search term results for each of webscraping, bigdata, and dataviz.
- Select List of values from the dropdown list to specify the parameter type.
- Enter a comma-separated list of values in the box. For this example, enter Securitas,FedEx,KFC,Apple,Verizon,AT&T,Best-Buy in the box. click Save & Add to list.
- The number of generated URLs is 3 and the list of URLs appears in the URL preview box.
Now, click on ‘save’ to save the extractor.
- Range of Numbers:
- The URL generator allows numeric values.
Example:
- https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=1
- https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=2
- https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=3
- https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=4
The following example shows how to use the Range of Numbers parameter type to generate a list of URLs by varying the page number.
- Click the Edit button to add the URL you want to parameterize. For instance, enter https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276?_pgn=1 in the text box.
- Select the variable portion of the URL, that is, the text that follows the equals sign and click on add parameter.
- In this case the 1 represents the page number.
- Once added as parameter, the 1 changes to Parameter-1 and the options for the parameter appear.
- Select Range of numbers from the dropdown list to specify the parameter type.
- For the range box values, enter 1 and 100. This will generate pages from 1 to 100.
- Set step to 1. The step box specifies the value to add to each number when creating the list. For example, setting the step to 5 will access every fifth page.
- The number of generated URLs is 100 and the list of URLs appears in the URL preview box.
Now, click on ‘save’ to save the extractor.
Specifying multiple parameters
URLs can contain more than one parameter.
To specify additional parameters in the URL generator, you need to repeat the highlighting process. As you alter the parameter values, the list of URLs will also change in the URL preview box underneath.
The URL list contains all combinations of the parameters.
For example, in the following screenshot, parameters specify three cities, and five pages of each city combination for a total of 9 URLs.
- https://www.indeed.com/jobs?q=&l=New+York&start=10
- https://www.indeed.com/jobs?q=&l=New+York&start=20
- https://www.indeed.com/jobs?q=&l=New+York&start=30
- https://www.indeed.com/jobs?q=&l=Chicago&start=10
- https://www.indeed.com/jobs?q=&l=Chicago&start=20
- https://www.indeed.com/jobs?q=&l=Chicago&start=30
- https://www.indeed.com/jobs?q=&l=Houston&start=10
- https://www.indeed.com/jobs?q=&l=Houston&start=20
- https://www.indeed.com/jobs?q=&l=Houston&start=30
Here take,
Parameter 1: New+York
Value for parameter 1: New+York,Chicago,Houston
Parameter 2: 10
Value for parameter 2: 10 to 30, step = 10
Removing parameters
To remove a parameter, click the Delete icon to the left of the parameter definition.
Editing the URL
To change the URL, click Edit at any time.
Comments
0 comments
Please sign in to leave a comment.