linkedin_jobs_crawler

The linkedin_jobs_crawler is a Python web crawler script made to investigate crawling techniques using the website LinkedIn. In this case, the crawler searches for job postings (entries) containing a job poster and filters data relating to company name, job position, and job page link.

The crawler can be modified to run in a headless browser; it does not by default to leave use of login information to the user.

Installation:

Required Packages/Software:

The following is required to use this script:

Python 3.6 or greater
Selenium
Beautiful Soup 4
Chromedriver 2.41 for browser automation
Google Chrome or Chromium

Installing The Script:

Clone the repository to your machine using git:

git clone https://github.com/will-huynh/linkedin_jobs_crawler.git

Go to the cloned directory on your local machine and check for the latest version using git:

Navigate to the cloned linkedin_jobs_crawler folder

git branch master

git pull

Download Chromedriver 2.41 and place the chromedriver executable file in the linkedin_jobs_crawler folder (the same directory as the script).

Using the crawler:

Use of the crawler is enabled by the command line. The crawler takes a query (job position), search location, and output file name with the .csv extension. The crawler then outputs scraped results to /<script_dir>/output/<csv_file>.

First, navigate to the script directory. The crawler is then run with a terminal command using three required arguments that must be passed to the crawler, specified by the following command and tags:

python3 linkedin_jobs_crawler.py

-k or --keyword ""

-l or --location ""

-o or --output "<csv_filename>"

Some example commands would be:

python3 -k "engineer" -l "Vancouver, Canada" -o "output.csv"

python3 --keyword "developer" --location "89143" --output "results.csv"

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
LICENSE		LICENSE
README.md		README.md
linkedin_jobs_crawler.py		linkedin_jobs_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md