Skip to content

Releases: philippgille/chromem-go

v0.6.0 (2024-04-25)

25 Apr 21:11
Compare
Choose a tag to compare

Highlights in this release are an extended interface, experimental WebAssembly bindings, and the option to use a custom Ollama URL. But also the fact that three people contributed to this release! Thank you so much! 🙇‍♂️

Added

  • Added Collection.Delete() to delete documents from a collection (PR #63 by @iwilltry42)
  • Added an experimental WebAssembly binding (package wasm) and example (PR #69)

Improved

  • Use prefixes for nomic-embed-text model in RAG-Wikipedia-Ollama example (PR #49, #65)
    • Thanks @rinor for pointing out the bug!
  • Made Ollama URL configurable (PR #64 by @erikdubbelboer)
  • Added building of code examples to CI (PR #66)
  • Improved RAG template (PR #67)

Breaking changes

  • NewEmbeddingFuncOllama now requires a second parameter for the base URL. But it can be empty to use the default which was also used in the past.

New Contributors

Full Changelog: v0.5.0...v0.6.0

v0.5.0 (2024-03-23)

23 Mar 11:52
Compare
Choose a tag to compare

Highlights in this release are query performance improvements (5x faster, 98% fewer memory allocations), export/import of the entire DB to/from a single file with optional gzip-compression and AES-GCM encryption, optional gzip-compression for the regular persistence, a new code example for semantic search across 5,000 arXiv papers, and an embedding func for Cohere.

Added

  • Added arXiv semantic search example (PR #45)
  • Added basic query benchmark (PR #46)
  • Added unit test for collection query errors (PR #51)
  • Added Collection.QueryEmbedding() method for when you already have the embedding of your query (PR #52)
  • Added export and import of the entire DB to/from a single file, with optional gzip-compression and AES-GCM encryption (PR #58)
  • Added optional gzip-compression to the regular persistence (i.e. the DB from NewPersistentDB() which writes a file for each added collection and document) (PR #59)
  • Added minimal example (PR #60, #62)
  • Added embedding func for Cohere (PR #61)

Improved

  • Changed the example link target to directory instead of main.go file (PR #43)
  • Improved query performance (5x faster, 98% fewer memory allocations) (PR #47, #53, #54)
    • benchstat output
      goos: linux
      goarch: amd64
      pkg: github.com/philippgille/chromem-go
      cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                                          │    before     │               after                 │
                                          │    sec/op     │    sec/op     vs base               │
      Collection_Query_NoContent_100-8      413.69µ ±  4%   90.79µ ±  2%  -78.05% (p=0.002 n=6)
      Collection_Query_NoContent_1000-8     2759.4µ ±  0%   518.8µ ±  1%  -81.20% (p=0.002 n=6)
      Collection_Query_NoContent_5000-8     12.980m ±  1%   2.144m ±  1%  -83.49% (p=0.002 n=6)
      Collection_Query_NoContent_25000-8    66.559m ±  1%   9.947m ±  2%  -85.06% (p=0.002 n=6)
      Collection_Query_NoContent_100000-8   282.41m ±  3%   39.75m ±  1%  -85.92% (p=0.002 n=6)
      Collection_Query_100-8                416.75µ ±  2%   90.99µ ±  1%  -78.17% (p=0.002 n=6)
      Collection_Query_1000-8               2792.8µ ± 23%   595.2µ ± 13%  -78.69% (p=0.002 n=6)
      Collection_Query_5000-8               15.643m ±  1%   2.556m ±  1%  -83.66% (p=0.002 n=6)
      Collection_Query_25000-8               78.29m ±  1%   11.66m ±  1%  -85.11% (p=0.002 n=6)
      Collection_Query_100000-8             338.54m ±  5%   39.70m ± 12%  -88.27% (p=0.002 n=6)
      geomean                                12.97m         2.192m        -83.10%
      
                                          │      before      │               after                 │
                                          │       B/op       │     B/op      vs base               │
      Collection_Query_NoContent_100-8       1211.007Ki ± 0%   5.030Ki ± 0%  -99.58% (p=0.002 n=6)
      Collection_Query_NoContent_1000-8      12082.16Ki ± 0%   13.24Ki ± 0%  -99.89% (p=0.002 n=6)
      Collection_Query_NoContent_5000-8      60394.23Ki ± 0%   45.99Ki ± 0%  -99.92% (p=0.002 n=6)
      Collection_Query_NoContent_25000-8     301962.1Ki ± 0%   206.7Ki ± 0%  -99.93% (p=0.002 n=6)
      Collection_Query_NoContent_100000-8   1207818.1Ki ± 0%   791.4Ki ± 0%  -99.93% (p=0.002 n=6)
      Collection_Query_100-8                 1211.006Ki ± 0%   5.033Ki ± 0%  -99.58% (p=0.002 n=6)
      Collection_Query_1000-8                12082.11Ki ± 0%   13.25Ki ± 0%  -99.89% (p=0.002 n=6)
      Collection_Query_5000-8                60394.10Ki ± 0%   46.04Ki ± 0%  -99.92% (p=0.002 n=6)
      Collection_Query_25000-8               301962.1Ki ± 0%   206.8Ki ± 0%  -99.93% (p=0.002 n=6)
      Collection_Query_100000-8             1207818.1Ki ± 0%   791.4Ki ± 0%  -99.93% (p=0.002 n=6)
      geomean                                   49.13Mi        54.97Ki       -99.89%
      
                                          │    before     │              after                │
                                          │   allocs/op   │ allocs/op   vs base               │
      Collection_Query_NoContent_100-8        238.00 ± 0%   94.00 ± 1%  -60.50% (p=0.002 n=6)
      Collection_Query_NoContent_1000-8       2038.5 ± 0%   140.5 ± 0%  -93.11% (p=0.002 n=6)
      Collection_Query_NoContent_5000-8      10039.0 ± 0%   172.0 ± 1%  -98.29% (p=0.002 n=6)
      Collection_Query_NoContent_25000-8     50038.0 ± 0%   204.0 ± 1%  -99.59% (p=0.002 n=6)
      Collection_Query_NoContent_100000-8   200038.0 ± 0%   232.0 ± 3%  -99.88% (p=0.002 n=6)
      Collection_Query_100-8                  238.00 ± 0%   94.50 ± 1%  -60.29% (p=0.002 n=6)
      Collection_Query_1000-8                 2038.0 ± 0%   141.0 ± 1%  -93.08% (p=0.002 n=6)
      Collection_Query_5000-8                10038.0 ± 0%   174.5 ± 2%  -98.26% (p=0.002 n=6)
      Collection_Query_25000-8               50038.0 ± 0%   205.5 ± 2%  -99.59% (p=0.002 n=6)
      Collection_Query_100000-8             200038.5 ± 0%   233.0 ± 1%  -99.88% (p=0.002 n=6)
      geomean                                 8.661k        161.4       -98.14%
      
  • Extended parameter validation (PR #50, #51)
  • Simplified unit tests (PR #55)
  • Improve NewPersistentDB() path handling (PR #56)
  • Improve loading of persistent DB (PR #57)
  • Increased unit test coverage in various of the other listed PRs

Fixed

  • Fixed path joining (PR #44)

Breaking changes

  • Due to vectors now being normalized at the time of adding the document to the collection instead of when querying, the persisted data from prior versions is incompatible with this version (PR #47)

v0.4.0 (2024-03-06)

06 Mar 22:46
Compare
Choose a tag to compare

Highlights in this release are optional persistence, an extended interface, support for creating embeddings with Ollama, the exporting of the Document struct, and more Go-idiomatic methods to add documents to collections.

Added

  • Extended the interface:
    • DB.ListCollections() (PR #12)
    • DB.GetCollection() (PR #13 + #19)
    • DB.DeleteCollection() (PR #14)
    • DB.Reset() (PR #15)
    • DB.GetOrCreateCollection() (PR #22)
    • Collection.Count() (PR #27)
    • Document struct, NewDocument() function, Collection.AddDocument() and Collection.AddDocuments() methods (PR #34)
      • More Go-idiomatic alternatives to Collection.Add()
  • Added various unit tests (PR #20, #39)
  • Added optional persistence! Via multiple PRs:
  • Added support for creating embeddings with Ollama (PR #32)
  • Added example documentation (PR #42)

Improved

  • Improved example (PR #11, #28, #33)
  • Stop exporting Collection.Metadata (PR #16)
    • Goal: Prevent direct modifications which could cause data races in case of the user doing a modification while chromem-go for example ranges over it during a Collection.Query() call.
  • Copy metadata in constructors (PR #17)
    • Goal: Prevent direct modifications which could cause data races in case of the user doing a modification while chromem-go for example ranges over it.
  • Improved CI (PR #18)
    • Add Go 1.22 to test matrix, update used GitHub Action from v4 to v5, use race detector during tests
  • Reorganize code internally (PR #21)
  • Switched to newer recommended check for file related ErrNotExist errors (PR #29)
  • Added more validations in several existing methods (PR #30)
  • Internal variable renamed (PR #37)
  • Fail unit tests immediately (PR #40)

Fixed

  • Fixed metadatas validation in Collection.AddConcurrently() (PR #35)
  • Fixed Godoc of Collection.Query() method (PR #36)
  • Fixed length of result slice (PR #38)
  • Fixed filter test (PR #41)

Breaking changes

  • Because functions can't be (de-)serialized, GetCollection requires a new parameter of type EmbeddingFunc, in order to set the correct func when using a DB with persistence and it just loaded the collections and documents from storage. (PR #25)
  • Some methods now return an error (due to file operations when persistence is used)
  • Additional validations will return an early error, but most (if not all) prior calls with the invalid parameters probably lead to some errors down the line anyway
  • Collection.Metadata is not exported anymore
  • Result.Document field was renamed to Result.Content, to avoid confusion with the now exported Document struct

v0.3.0 (2024-02-10)

10 Feb 18:19
Compare
Choose a tag to compare

Added

Improved

  • Improve concurrency when adding documents to collection (PR #2)
  • Rename Client to DB to better indicate that the database is embedded and there's no client-server separation (PR #3)
  • Change OpenAPI embedding model from "text-embedding-ada-002" to "text-embedding-3-small" (PR #4)
  • Allow custom base URL for OpenAI, enabling the use of Azure OpenAI, LiteLLM, ollama etc. (PR #7)
  • Renamed EmbeddingFunc constructors to follow best practice (PR #9)

Fixed

  • Don't allow nResults arg < 0 (PR #5)

Breaking changes

  • Several function names and signatures were changed in this release. This can happen as long as the version is at v0.x.y.

v0.2.0 (2024-01-01)

01 Jan 18:20
Compare
Choose a tag to compare

Added

  • Added GitHub Actions config (commit)
  • Added CHANGELOG.md (commit)
  • Exported embedding creation functions (commit)
  • Added Collection.AddConcurrently to add embeddings concurrently (commit)

Improved

  • Improved example code (commit)
  • Removed unused field in Client (commit)
  • Improved validation in Query method (commit)
  • Added and improved Godoc (commit)
  • Improved locking around a collection's documents (commit)
  • Removed dependency on third party library for OpenAI (commit)
  • Parallelized document querying (PR #1)

v0.1.0 (2023-12-29)

29 Dec 00:01
Compare
Choose a tag to compare

Initial release with a minimal Chroma-like interface and a working retrieval augmented generation (RAG) example.