The linkedin_jobs_crawler is a Python web crawler script made to investigate crawling techniques using the website LinkedIn. In this case, the crawler searches for job postings (entries) containing a job poster and filters data relating to company name, job position, and job page link.
The crawler can be modified to run in a headless browser; it does not by default to leave use of login information to the user.
The following is required to use this script:
- Python 3.6 or greater
- Selenium
- Beautiful Soup 4
- Chromedriver 2.41 for browser automation
- Google Chrome or Chromium
- Clone the repository to your machine using git:
git clone https://github.com/will-huynh/linkedin_jobs_crawler.git
- Go to the cloned directory on your local machine and check for the latest version using git:
Navigate to the cloned linkedin_jobs_crawler folder
git branch master
git pull
- Download Chromedriver 2.41 and place the chromedriver executable file in the linkedin_jobs_crawler folder (the same directory as the script).
Use of the crawler is enabled by the command line. The crawler takes a query (job position), search location, and output file name with the .csv extension. The crawler then outputs scraped results to /<script_dir>/output/<csv_file>.
First, navigate to the script directory. The crawler is then run with a terminal command using three required arguments that must be passed to the crawler, specified by the following command and tags:
python3 linkedin_jobs_crawler.py
-k or --keyword ""
-l or --location ""
-o or --output "<csv_filename>"
Some example commands would be:
python3 -k "engineer" -l "Vancouver, Canada" -o "output.csv"
python3 --keyword "developer" --location "89143" --output "results.csv"