Skip to content

jleguina/entity-normalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About The Project

This project is an entity normalisation engine developed for the Vector AI recruitment process. It supports entity normalisation for the following types of entities:

  • Companies, businesses;
  • Products, objects;
  • Locations, cities, countries;
  • Serial numbers;
  • Street addresses.

The model takes as input a stream of strings in the classes above. There is no context provided for each entity.

The model performs a normalisation to suitable Wikipedia articles for the first three types of entities. Given the uniqueness of the latter two types of entities, normalisation is performed according to linguistic similarity of the input entities using the Levenshtein distance.

The model accepts entities in any language supported by the Google Translator API.

Getting Started

To set up this project:

  1. Clone GitHub repo:
git clone https://github.com/jleguina0/entity-normalization.git
  1. Create a suitable virtual environment and install dependencies:

    • With conda:
      cd entity-normalization
      conda env create -f environment.yml
      conda activate entity-norm37
    • Or else, create a virtual environment with Python 3.7 and do:
      pip install -r requirements.txt
  2. To run the normalization engine with some predefined examples in various languages:

    python entity_norm.py

Contact

Javier Leguina Peral - jleguina0@gmail.com

Project Link: https://github.com/jleguina0/entity-normalization

About

Entity normalization engine for Vector.ai

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages