Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor logging functionality and add features #402

Open
MacOS opened this issue Feb 11, 2024 · 2 comments
Open

Refactor logging functionality and add features #402

MacOS opened this issue Feb 11, 2024 · 2 comments

Comments

@MacOS
Copy link
Contributor

MacOS commented Feb 11, 2024

Proposed new feature or change

This issue is a first dump of my thoughts. Feel free to comment on it and improve it.

Motive

Logging functionality is currently implemented with the print function and the logging package. In addition, some of these are commented out for no apparent reason.

The issue is that users want the logging to be done by one and only one way. In addition, the users want to inspect the logs in a log file.

I therefore propose to implement a coherent logging system. In this process, the following should be done.

  1. uncomment all commented out print and logging code,
  2. move all the logging to the logging package,
  3. add a central logging object for each module, and
  4. add that the logs are written to log files,
  5. remove all update strings in the messages,

Example

Actual

As stated above, the logic is spread out throughout the code base. Here are some examples for each of the five points above.

print used for logging.

print (f"update: embedding_handler - Milvus - Embeddings Created: {embeddings_created} of {num_of_blocks}")

An commented out print statement.

# print("update: embedding_summary - ", embedding_summary)

A logging statement with update in it.

logging.warning("update: embedding_handler - unable to determine if embeddings have "
"been properly counted and captured. Please check if databases connected.")

Desired

It is desirable to clean up the logging functionality and add logging to a file. The output of such a log message, should, for example,

2024-02-11T16:04:47.556391+00:00:embedding:INFO: Milvus - Embeddings Created: 10 of 50

or

2024-02-11T16:04:47.556391+00:00:embedding:WARNING: unable to determine if embeddings have been properly counted and captured.   Please check if databases connected.

Generally, the form should hence be

<DATETIME>:<MODULE_NAME>:<LOGGING_LEVEL>:<LOGGING_MESSAGE>

Existing solutions I looked for

Within the package, none, since it is currently not implemented at all.

Outside of the package, yes. A blueprint might be to take a look at how flask does it.

Proposed solution

In __init__.py add the following.

import os
import logging

from llmware.configs import LLMWareConfig


def create_logger():
    logging.basicConfig(level=LLMWareConfig.get_logging_level())

    logger = logging.getLogger(__name__)

    # maybe some config logic here, for example to set where the logs should be written to,
    # depending on the operating system. In addtion, setting the format of the log messages - see above.
    ....

    return logger

And then add in each module

from __init___ import create_logger

logger = create_logger()

Technical Analysis

The proposed efforts, and hence the changes, would clean the codebase (e.g. moving print class to the logger), streamline the logging functionality (e.g. adding the create_logger function to configure the logger centrally), and add new features for permanent logging to a file, or files.

Additional links

Logging HOWTO
Flask source code

@doberst
Copy link
Contributor

doberst commented Mar 13, 2024

@MacOS - agree 100% with your assessment - and would appreciate your help and support to drive this logging improvement. As you note, it will require updates across the code-base.

@MacOS
Copy link
Contributor Author

MacOS commented Mar 13, 2024

Great!

I think it is a good idea to chop this up in smaller junks. May I suggest the following steps as a starting point for discussions, in that order.

  1. A PR that deals with the commented out statements. Either by deleting them or adding them back.
  2. A PR that unifies the LOGGING_MESSAGE.
  3. A PR that moves all the logging to the logging package.
  4. A PR that adds a central logging object for each module, and
  5. A PR that adds that the logs can optionally be are written to log files,

Besides, it is really important that we agree on the logging format. The format I suggested has the obvious problem with the :, as it separates the parts of the logging message and the parts of the DATETIME.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants