Using “Manage Inputs” tab of a scraper, you can manage the list of URLs extracted for when a scraper starts a run.
You can either manually add URLs, import them from a CSV file, extract them from other pages with Chained Extractors, or create similar pattern URLs using URL Generator.
Elements of the ‘Manage Inputs’ View
Case 1: Dropdown set to an explicit list of URLs
- Input source: Dropdown to set whether the scraper uses URLs from an explicit list of URLs provided or URLs extracted by another scraper.
- Show invalid URLs: Shows only the invalid URLs present in the current list of URLs.
- Remove Invalid URLs: Removes any invalid URLs from the list.
- Remove all URLs: Removes all the URLs from the list to start over.
- Import URLs: Import a list of URLs from a CSV file.
- Remove Duplicate URLs: Removes any duplicate URLs from the list.
- Download URLs: Download a list of the URLs in CSV format.
- Generate URLs: Create URLs using the URL Generator.
- List view: Shows all of the URLs currently added.
- URLs Input: You can manually add URLs by pasting URLs.
- Save: This saves any changes made to the URL list. When you add/remove/update URLs using the URLs Input, the changes will not be saved until you click ‘Save’.
- Run Scraper: Starts a scraper run. If you have unsaved changes, this button will be disabled until you save your changes.
Case 2: Dropdown set to URLs from another extractor
- Input source: Dropdown to set whether the scraper uses URLs from an explicit list of URLs provided or URLs extracted by another scraper.
- Parent Scraper: Scraper that extracts the list of URLs to use.
- Input Column: Specific column that has URLs (e.g. detail page URLs) to use from Parent extractor.
- Always run the parent first: On/off toggle to automatically trigger the current child scraper after its parent scraper completes a crawl run.
- Save Channing: This saves any changes made to the input source.
- Run Chaining:
- If Always run the parent first On:
- This will trigger the parent extractor to run first before running the current child extractor.
- If Always run the parent first Off:
- This will start child scraper run by taking input URLs from parent scraper.
- If Always run the parent first On:
Comments
0 comments
Please sign in to leave a comment.