Add ability to parse code #293

santib · 2023-08-12T19:21:53Z

Something like this is needed so we can load codebases in Vector DBs

andreibondarev · 2023-08-13T09:26:20Z

@santib This opens up an interesting problem -- how do we chunk code?

santib · 2023-08-14T00:20:34Z

@santib This opens up an interesting problem -- how do we chunk code?

Yeah, I just used the Text one for simplicity, but checking https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/code_splitter seems like they get the separators for each language, and that's it.

I can change this PR to do something similar if you want

andreibondarev · 2023-08-14T17:53:42Z

@santib Yeah, I like that!

santib · 2023-08-15T19:08:06Z

lib/langchain/data.rb

+    def chunks(options = {})
+      if Langchain::Processors::Code::EXTENSIONS.include?(source_type)
+        options = options.merge(separators: Langchain::Chunker::Base::LANG_SEPARATORS[source_type])
+        Langchain::Chunker::RecursiveText.new(@data, **options).chunks


TODO: add tests

santib force-pushed the add-code-loader branch from cd97b07 to 9553860 Compare August 12, 2023 19:28

Add ability to parse code

286f270

santib force-pushed the add-code-loader branch from 9553860 to 1e70bc7 Compare August 15, 2023 19:01

Use lang specific chunker when corresponds

dc52a2a

santib force-pushed the add-code-loader branch from 1e70bc7 to dc52a2a Compare August 15, 2023 19:07

santib commented Aug 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to parse code #293

Add ability to parse code #293

santib commented Aug 12, 2023 •

edited

andreibondarev commented Aug 13, 2023

santib commented Aug 14, 2023

andreibondarev commented Aug 14, 2023

santib Aug 15, 2023

Add ability to parse code #293

Are you sure you want to change the base?

Add ability to parse code #293

Conversation

santib commented Aug 12, 2023 • edited

andreibondarev commented Aug 13, 2023

santib commented Aug 14, 2023

andreibondarev commented Aug 14, 2023

santib Aug 15, 2023

Choose a reason for hiding this comment

santib commented Aug 12, 2023 •

edited