In this article:
Web Crawler Overview
Onna's web crawler was created to index web pages.
Connector Features | |
Authorized Connection Required? No | Is identity mapping supported? No |
Audit logs available? Yes | Admin Access? No |
Supports a full archive? No | Custodian based collections? No |
Preserve in place with ILH? No | Resumable sync supported? No |
Supports Onna preservation? No | Syncs future users automatically? No |
Sync modes supported:
|
Is file versioning supported? No |
Types of Data Collected | Metadata Collected |
|
|
Web Crawler Considerations
- The web crawler does not currently support password-protected websites or Captcha protected websites.
- You’re not able to collect files in their native format from the links on a web page during collection. Web crawler links are embedded so they will not be able to pull files in their native formats.
Web Crawler Requirements
- When adding a new Web Crawler sync, you have to introduce the URLs with the protocol (http, https)
How to Connect and Collect Using Web Crawler
To create a new Web Crawler collection follow the steps below:
Step 1Click on ‘Workspaces’ in the main menu (a), then click on the workspace where you’d like to add a new sync (b). |
|
Step 2Click on the ‘+’ icon in the upper right corner to add a new source. |
|
Step 3Select the Web Crawler connector from your list of available connectors. |
|
Step 4To configure your sync start by entering a name for your source in the ‘Name’ field (a). Then, enter the URL you want to collect from (b). Finally, click the blue ‘Done’ button (c). |
|
Step 5You’ll now see your new source appear alphabetically in the list of ‘Connected sources’ in your workspace. |
|
Comments
0 comments
Please sign in to leave a comment.