Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error when starting crawl #53

Open
nvanderperren opened this issue Oct 13, 2020 · 1 comment
Open

Fatal error when starting crawl #53

nvanderperren opened this issue Oct 13, 2020 · 1 comment

Comments

@nvanderperren
Copy link

nvanderperren commented Oct 13, 2020

Are you submitting a bug report or a feature request?

Bug report

What is the current behavior?

I get an error when I want to start a crawl. This is the error

Running Crawl From Config File configurations/social-media.json
Crawler Operating In undefined mode
Crawler Will Be Preserving 2 Seeds
Crawler Will Be Generating WARC Files Using the filenamified url
Crawler Generated WARCs Will Be Placed At warcs
Crawler Is Connecting To Chrome On Host localhost
Crawler Is Connecting To Chrome On Port 9222
Crawler Will Be Waiting At Maximum For Navigation To Happen For 8s
Crawler Will Be Waiting After For 2 inflight requests
Crawler Will Be Generating WARC Files Using the filenamified url
Crawler Will Be Generating WARC Files Using the filenamified url
A Fatal Error Occurred
  TypeError: Cannot read property 'length' of undefined

  - chromeFinder.js:275 Function.findChromeDarwin
    /Users/nastasia/Developer/Squidwarc/lib/launcher/chromeFinder.js:275:20

  - chrome.js:90 async Function.launch
    /Users/nastasia/Developer/Squidwarc/lib/launcher/chrome.js:90:28

  - chrome.js:143 async ChromeCrawler.init
    /Users/nastasia/Developer/Squidwarc/lib/crawler/chrome.js:143:22

  - chromeRunner.js:143 async chromeRunner
    /Users/nastasia/Developer/Squidwarc/lib/runners/chromeRunner.js:143:3

  - index.js:31 async runner
    /Users/nastasia/Developer/Squidwarc/lib/runners/index.js:31:5

This is my configuration file:

{
	"mode": "page-only",
	"depth": 1,
	"seeds": [
		"http://www.facebook.com/nastyvdp",
		"http://www.twitter.com/nvanderperren"
	],
	"warc": {
		"naming": "url",
		"append": "true",
		"output": "warcs"
	},
	"connect": {
		"launch": true,
		"host": "localhost",
		"port": 9222,
		"userDataDir": "/Users/nastasia/Library/Application Support/Google/Chrome"
	},
	"crawlControl": {
		"globalWait": 60000,
		"inflightIdle": 1000,
		"numInflight": 2,
		"navWait": 8000
	}
}	

Because it says that mode is undefined, I also placed mode under crawlControl as suggested in issue #50, but that doesn't solve the issue

What is the expected behavior?

A starting crawl.

What's your environment?

node v14.12.0
Squidwarc: current master
macOS High Sierra 10.13.6
Chrome Versie 86.0.4240.80 (Officiële build) (x86_64)

Other information

I don't have this issue if I use puppeteer.

@nvanderperren nvanderperren changed the title Fatal erro when starting crawl Fatal error when starting crawl Oct 13, 2020
@blzbrg
Copy link

blzbrg commented Mar 5, 2023

I think this is a just a typo in the mac version of the browser finding code, here is

if (sortedExes.length > 0) {

    let sortedExes = installations
      // assign priorities
      .map(inst => {
        for (const pair of priorities) {
          if (pair.regex.test(inst)) {
            return { path: inst, weight: pair.weight }
          }
        }
        return { path: inst, weight: defaultPriority }
      })
      // sort based on priorities
      .sort((a, b) => b.weight - a.weight)
      // remove priority flag
      .map(pair => pair.path)[0]           # <=== this [0] is only in the mac version of this function
    if (sortedExes.length > 0) {
      return sortedExes[0]
    }

Will try to get access to a mac to test changing it to .map(pair => pair.path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants