Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Hosts crawled #37

Open
Vaccam opened this issue Nov 5, 2018 · 4 comments
Open

Limit Hosts crawled #37

Vaccam opened this issue Nov 5, 2018 · 4 comments

Comments

@Vaccam
Copy link

Vaccam commented Nov 5, 2018

I running a crawl on my companies intranet. Is there a way to limit what hosts it crawls. It seems to be crawling every connection on our intranet.
crawlhosts

@kenkenchow
Copy link

You may add url_match_rule for url_filter in gopa.yml

@Vaccam
Copy link
Author

Vaccam commented Nov 6, 2018

Thank you for your response. Once I have added this to the yml file and stop and start the server, the crawl continues to crawl the other hosts. Do I need to do something else to get the existing crawl to stop.

Thanks,

Michael

@Vaccam
Copy link
Author

Vaccam commented Nov 6, 2018

Is this what I want, as an example:

yml

@kenkenchow
Copy link

host_match_rule:
must:
prefix: []
contain: [url.that.iwant]

I put it as array format and it works
Please check if gopa is still running after you stopped. If its still running, kill -9 pid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants