Skip to content

Latest commit

 

History

History
44 lines (32 loc) · 1.8 KB

content_parser.md

File metadata and controls

44 lines (32 loc) · 1.8 KB

How to Create a Custom Content Parser

Admittedly, creating a custom content parser is a bit cumbersome. However, once it's configured, it shouldn't change frequently.

Inspect the DOM

Right-click on any part of the page and select "Inspect" to open the "Elements" tab of the Chrome Developer Tools.

Screenshot of Multimodal Screenshot of Multimodal

Find the Containing <div>

An effective approach is to find the innermost (or outermost depending on your use case) containing <div> of the selected element that has a unique id or class that can be used to distinguish the container.

Note: The container tag doesn't necessarily have to be a <div> tag. It can be any tag.

Determine Selector Query

Determine the selector (and selectorAll) queries and add them to the existing content parser configuration in the Lumos Options page. See documentation for querySelector() and querySelectorAll() to confirm all querying capabilities and see more examples.

Example queries:

  1. Select element by tag name: tagName
  2. Select element by id (leading #): #elementId
  3. Select element by class name (leading .): .className

querySelector() supports complex selectors and negation.

Example config for a single domain:

{
  "domain.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "tagName",
      "#elementId"
    ],
    "selectorsAll": [
      ".className"
    ]
  }
}