Skip to content

Releases: Dataherald/dataherald

v1.0.3

30 Apr 15:22
d4d6f4e
Compare
Choose a tag to compare

Release Notes for Version 1.0.3

What's New

1. New features

  • Added Redshift support c8e55a2
  • Added multi-schemas support for db connections, it only works for Postgres, Bigquery, Snowflake and Databricks d4d6f4e

2. Improvements and fixes

  • Fixed uri validation for db connections f40ac0e
  • Fixed the fallback and confidence score 30f5226
  • Fixed the observation code blocks a11d1f1
  • Fixed refresh endpoint error handling 15b6d46
  • Fixed the malformed sql queries in intermediate steps 828c64d
  • If the sql-generation endpoint gets an invalid sql it should raise an error fbd96ea

New Contributors

v1.0.2

03 Apr 20:17
39bef01
Compare
Choose a tag to compare

Release Notes for Version 1.0.2

What's New

1. New features

  • Adds Astra vector store support 6f39892
  • Adds MS SQL Server support 078c17d
  • Adds Streaming endpoint to show intermediate steps 1205d8a
  • Adds support to Pinecone serverless 7906f03
  • Adds intermediate steps in the SQL Generation response 3dbd483
  • Adds LangSmith metadata param(langsmith_metadata) to easily filter cf88a1b
  • Stores the db dialect when a db connection is created 809ac31

2. Improvements and fixes

  • Adds logs when a request fails 09f65c6
  • Adds descriptions to the new agent faf07de
  • Fixes malformed LLM output 4190b4d
  • Documents error codes e94c788
  • Fixes the running query forever issue cfb1d5b
  • Fixes the error parsing handler 8751410
  • Added Click House Hyperloglog support to improve the scaninng 61a92c9
  • Fixes SQL generation 5160e8d
  • Fixes background scanner process in Parallel 88ee8fa
  • Fixes error handling for golden SQL additions 8efb00f

3. Migration Script

  • Purpose: To facilitate a smooth transition from version 1.0.1 to version 1.0.2, we've introduced a migration script.
  • Data Modifications: The script performs the following actions:
    • Decrypts all the db connection uri column
    • Executes a regex method to retrieve the db dialect.
    • Stores dialect column in database_connections mongo collection.

To run the migration script, use the following command:

docker-compose exec app python3 -m dataherald.scripts.populate_dialect_db_connection

New Contributors

v1.0.1

04 Mar 19:53
5e7a5de
Compare
Choose a tag to compare

Release Notes for Version 1.0.1

What's New

1. New features

  • Added clickhouse support d494fed
  • MariaDB/MySQL support officially added and documented. 7b86ad3
  • Added a refresh endpoint (POST /api/v1/table-descriptions/refresh) to get the table name from a specified database and store them into the table-description Mongo collection. This improves response time when querying the table-description list endpoint (GET /api/v1/table-descriptions). 28b8130
  • Implemented error codes for better error handling. Now errors response a 400 HTTP status code. 2c70f16

2. Changes and fixes

  • Reduced SSH fields in requests by utilizing the connection_uri field. 64ceb6e
  • Updated LLM with the latest models. dd440f2
  • Expanded functionality to allow SSH connections on different ports. 1a5a2be
  • Improved performance for scanning endpoint (POST /api/v1/table-descriptions/sync-schemas). 435884e

3. Migration Script

  • You don't need to update the data if you're already using the stable 1.0.0 version; you can simply pull these changes.

New Contributors

v01.0.0

16 Jan 16:52
a226a9d
Compare
Choose a tag to compare

Release Notes for Version 1.0.0

What's New

1. New Resources, Attributes, and Endpoints

  • Finetuning: One of our new exciting features is automatic finetuning GPT family models on your golden questions/SQLs pairs.
    • POST /api/v1/finetuning: By calling this endpoint you can create a fientuning job on your golden question/SQL pairs. The only required parameter is the db_connection_id and you have the option to specify which golden question/SQL pairs you want to use for finetuning process.
    • GET /api/v1/finetuning/{finetuning_di}: With this endpoint you can retrieve the status of the finetuning process and once the status is SUCCEEDED you can use the model for SQL generation.
    • POST /api/v1/finetuning/{finetuning_id}/cancel: If you want to cancel the finetuning for whatever reason you can call this endpoint.
    • GET /api/v1/finetuning: List all of the finetuned models for a given db_connection_id
    • DELETE /api/v1/finetuning/{finetuning_di}: Delete a given finetuned model from the finetunings collection.
  • Metadata: All resources now include a metadata attribute, allowing you to store additional information for internal purposes. Soon, GET list endpoints will support filtering based on metadata fields.

2. Resource and Endpoint Changes

  • Renaming questions to prompts: The entity has been renamed to Prompt, and the collection is now called prompts. You can use the following endpoints to interact with this resource:

    • GET /api/v1/prompts: List all existing prompts.
    • POST /api/v1/prompts: Create a new prompt.
    • GET /api/v1/prompts/{prompt_id}: Retrieve a specific prompt.
    • PUT /api/v1/prompts/{prompt_id}: Update the metadata for a prompt.
  • Splitting responses into sql_generation and nl_generation: The previous responses resource has been divided into sql_generations and nl_generations. You can work with them as follows:

    • POST /api/v1/prompts/{prompt_id}/sql-generations: Create a sql-generation from an existing prompt.

    • POST /api/v1/prompts/sql-generations: Create a new prompt and a sql-generation.

    • GET /api/v1/prompts/sql-generations: List sql-generations.

    • GET /api/v1/sql-generations/{sql_generation_id}: Retrieve a specific sql-generation.

    • PUT /api/v1/sql-generations/{sql_generation_id}: Update the metadata for a sql-generation.

    • GET /api/v1/sql-generations/{sql_generation_id}/execute: Execute the created SQL and retrieve the result.

    • GET /api/v1/sql-generations/{sql_generation_id}/csv-file: Execute the created SQL and generate a CSV file using the result.

    • POST /api/v1/sql-generations/{sql_generation_id}/nl-generations: Create an nl-generation from an existing sql-generation.

    • POST /api/v1/prompts/{prompt_id}/sql-generations/nl-generations: Create a sql-generation and an nl-generation from an existing prompt.

    • POST /api/v1/prompts/sql-generations/nl-generations: Create a prompt, sql-generation, and nl-generation.

    • GET /api/v1/nl-generations: List all nl-generations.

    • GET /api/v1/nl-generations/{nl_generation_id}: Retrieve a specific nl-generation.

    • GET /api/v1/sql-generations/{sql_generation_id}: Retrieve a specific sql-generation.

    • PUT /api/v1/nl-generations/{nl_generation_id}: Update the metadata for an nl-generation.

  • Renaming golden_records to golden_sqls: We've updated the name for all endpoints, entities, and collections.

3. Migration Script

  • Purpose: To facilitate a smooth transition from version 0.0.5 to version 1.0.0, we've introduced a migration script.
  • Data Modifications: The script performs the following actions:
    • Renames the golden_records collection to golden_sqls.
    • Replaces all related data types from ObjectId to strings.
    • Updates table descriptions by changing "SYNCHRONIZED" status to "SCANNED" and "NOT_SYNCHRONIZED" to "NOT_SCANNED."
    • Utilizes the existing questions collections to create the prompts collection.
    • Converts responses collections into sql_generations and nl_generations collections.

To run the migration script, use the following command:

docker-compose exec app python3 -m dataherald.scripts.migrate_v006_to_v100

We hope that these changes enhance your experience with our platform. If you have any questions or encounter any issues, please don't hesitate to reach out to our support team.

v0.0.6

13 Nov 17:59
f37743f
Compare
Choose a tag to compare

What's Changed

1. Changes in POST /api/v1/responses endpoint:

If the sql_query body parameter is not set, the response is regenerated. This process generates new values for sql_query, sql_result, and response.

2. Introducing the generate_csv flag:

The generate_csv flag is a parameter that allows the generation of a CSV file populated with the sql_query_result rows. This parameter can be set in both POST /api/v1/responses and POST /api/v1/questions endpoints.

  • If the file is created, the response will include the field csv_file_path. For example:

    "csv_file_path": "s3://k2-core/c6ddccfc-f355-4477-a2e7-e43f77e31bbb.csv"
    
  • Additionally, if the generate_csv flag is set to True, the sql_query_result will return NULL when it contains more than 50 rows.

3. Configure S3 Credentials:

  • You have the flexibility to set your S3 credentials to store the CSV files within the POST /api/v1/database-connections endpoint as follows:
    "file_storage": {
        "name": "string",
        "access_key_id": "string",
        "secret_access_key": "string",
        "region": "string",
        "bucket": "string"
    }
  • If S3 credentials are not specified within the db_connection, the system will use the S3 credentials from your environment variables, as set in your .env file.

These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.

New Contributors

v0.0.5

25 Oct 22:57
d8439f3
Compare
Choose a tag to compare

What's Changed

1. Endpoint Update

  • Affected Endpoints: The changes impact two API endpoints:
    -- POST /api/v1/database-connections: This endpoint is used to create a database connection.
    -- PUT /api/v1/database-connections/{db_connection_id}: This endpoint is used to update a database connection.

  • Change Description: The llm_credentials object in these endpoints has been replaced with the llm_api_key field, which now only accepts strings as its value. In other words, the llm_credentials field has been removed, and it has been replaced with a simpler llm_api_key field that can only hold string values. This change suggests a more straightforward approach to managing API keys or credentials within the system.

5. Migration Script

  • Purpose: A migration script has been introduced to assist users in smoothly transitioning their data from version 0.0.4 to version 0.0.5.

  • Data Modification: This script operates on the data_connections collection and performs the following action:
    It replaces the llm_credentials field with the llm_api_key field but only if the llm_credentials field is populated in the data. In other words, if there is data in the llm_credentials field, the script will transfer it to the new llm_api_key field.

To run the migration script, use the following command:

docker-compose exec app python3 -m dataherald.scripts.migrate_v004_to_v005

These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.

New Contributors

v0.0.4

06 Oct 20:46
f57fde5
Compare
Choose a tag to compare

What's Changed f57fde5

1. Endpoint Renaming
We have streamlined our API endpoints for better consistency and clarity:

Renamed Endpoints:

  • POST /api/v1/nl-query-responses is now POST /api/v1/responses.
  • POST /api/v1/question is now POST /api/v1/questions.

2. Endpoint Removal
In this version, we have removed the following endpoint:

  • PATCH /api/v1/nl-query-responses/{query_id}.

Note: Responses resources are now immutable, so you can only create new responses and not update existing ones.

3. MongoDB Collection and Field Renaming
To improve consistency and readability, we have renamed MongoDB collection and field names:

Collection Name Changes:

  • nl_questions collection has been renamed to questions.
  • nl_query_responses collection has been renamed to responses.

Field Name Changes (within the responses collection):

  • nl_question_id has been renamed to question_id.
  • nl_response has been renamed to response.

4. Use of ObjectId for Foreign Keys
To enhance data integrity and relationships, we have transitioned to using ObjectId types for foreign keys, providing stronger data typing.

5. Migration Script
We've created a migration script to help you smoothly transition your data from version 0.0.3 to version 0.0.4. This script updates collection names, field names, and foreign keys data type to ObjectId. To run the migration script, use the following command:

docker-compose exec app python3 -m dataherald.scripts.migrate_v003_to_v004

Upgrade Instructions:

To upgrade to Version 0.0.4, follow these steps:

  • Ensure you have Docker Compose installed.
  • Pull the latest version of the application.
  • Run the provided migration script as shown above.

These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.

New Contributors

v0.0.3

25 Sep 23:20
9e2d119
Compare
Choose a tag to compare

What's Changed

1. Validate Database Connection Requests 5937b35

  • When a database connection is created or updated, it now attempts to establish a connection.
  • If the connection is successfully established, it is stored, and a 200 response is returned.
  • In case of failure, a 400 error response is generated.

2. Add LLM Credentials to Database Connection Endpoints 2d9e873

  • With the latest update, when creating or updating a database connection, you have the option to set LLM credentials. This allows you to use different keys for different connections

3. SSH Connection Update a66f7d8

  • We have discontinued the use of the private_key_path field for SSH connections.
  • Instead, we now utilize the path_to_credentials_file to specify the path to the SSH private key file.

4. Enhanced Table Scanning with Background Tasks fdc3bb7

  • We have implemented background tasks for asynchronous table scanning.
  • The endpoint name has been updated from /api/v1/table-descriptions/scan to /api/v1/table-descriptions/sync-schemas.
  • This enhancement ensures that even if the process operates slowly, potentially taking several minutes, the HTTP response remains consistently fast and responsive.

5. Returns Scanned Tables and Not Scanned Tables 9e2d119

  • This endpoint /api/v1/table-descriptions should make a db connection to retrieve all the table names and check which tables have been scanned to generate a response.
  • The status can be:
    • NOT_SYNCHRONIZED if the table has not been scanned
    • SYNCHRONIZING while the sync schema process is running
    • DEPRECATED if there is a row in our table-descriptions collection that is no longer in the database, probably because the table/view was deleted or renamed
    • SYNCHRONIZED when we have scanned the table
    • FAILED if anything failed during the sync schema process, and the error_message field stores the error.

6. Migration Script from v0.0.2 to v0.0.3 9e2d119

  • This script facilitates the transition from version v0.0.2 to v0.0.3 by performing the following essential task:
    In the table_descriptions collection, it updates the status field to the value SYNCHRONIZED.

To execute the script, simply run the following command:

docker-compose exec app python3 -m dataherald.scripts.migrate_v002_to_v003 

New Contributors

v0.0.2

14 Sep 15:54
86467b9
Compare
Choose a tag to compare

What's Changed

1. RESTful Endpoint Names and Swagger Grouping
We have made significant changes to our endpoint naming conventions, following RESTful principles. Additionally, we have organized the endpoints into logical sections within our Swagger documentation for easier navigation and understanding.

2. MongoDB Collection Name Changes
We have updated the names of several MongoDB collections. Here are the collection name changes:

  • nl_query_response ➡️ nl_query_responses
  • nl_question ➡️ nl_questions
  • database_connection ➡️ database_connections
  • table_schema_detail ➡️ table_descriptions

3. Migration to db_connection_id for MongoDB Collections
Previously, we used a db_alias field to relate MongoDB collections. In this release, we have transitioned to using a new field called db_connection_id to establish relationships between collections.

4. Renamed Core Methods for Code Clarity
To improve the clarity of our codebase, we have renamed several core methods.

5. Migration Script from v0.0.1 to v0.0.2
We understand the importance of a smooth transition between versions. This script performs the following actions:

  • Adds the db_connection_id relation for all MongoDB collections.
  • Renames all MongoDB collection names to align with the new naming conventions.
  • Deletes the Vector store data (Pinecone or Chroma) and utilizes the golden_records collection to upload the data seamlessly.

To execute the script just run this command
docker-compose exec app python3 -m dataherald.scripts.migrate_v001_to_v002

New Contributors