Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: remove connection pool from sqlite #8797

Draft
wants to merge 17 commits into
base: dev
Choose a base branch
from
Draft

Conversation

abyesilyurt
Copy link
Contributor

@abyesilyurt abyesilyurt commented May 9, 2024

Sqlite connection is not thread-safe. We use complicated and unstable pooling to make connections thread-safe. We don't need to do any locking if we just open the connection per each thread.

The tests are also much more stable now.

@@ -76,7 +76,7 @@ def compute() -> int:

client_low_ds.refresh()
res = client_low_ds.code.compute(blocking=True)
assert res == compute(blocking=True).get()
assert res == compute(syft_no_node=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We start threads when syft_no_node=False, which leads to unstable tests.

return Err(
f"Failed to acquire lock for the operation {self.lock.lock_name} ({self.lock._lock})"
)
# locked = self.lock.acquire(blocking=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need locks for sqlite anymore. Thread safety is done by sqlite on file level.
Mongodb should handle thread safety as well.

@@ -22,6 +23,31 @@ def test_sqlite_store_partition_sanity(
assert hasattr(sqlite_store_partition, "searchable_keys")


def test_sqlite_store_partition_benchmark(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test to benchmark performance. to be deleted after reviews.

Copy link
Member

@kiendang kiendang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could def _cursor just yield con.cursor() instead of

cur = con.cursor()
yield cur.execute(sql, *args)

?

We can rewrite this

with self._cursor(
    insert_sql, [str(key), _repr_debug_(value), data, str(key)]
) as _:
    pass

into

with self._cursor as cursor:
    cursor.execute(insert_sql, [str(key), _repr_debug_(value), data, str(key)])

looks a bit more idiomatic to me.

@abyesilyurt
Copy link
Contributor Author

Could def _cursor just yield con.cursor() instead of

cur = con.cursor()
yield cur.execute(sql, *args)

?

We can rewrite this

with self._cursor(
    insert_sql, [str(key), _repr_debug_(value), data, str(key)]
) as _:
    pass

into

with self._cursor as cursor:
    cursor.execute(insert_sql, [str(key), _repr_debug_(value), data, str(key)])

looks a bit more idiomatic to me.

I agree that it looks better, but I think this change is out of scope for this PR as it leads to much bigger refactoring. I will create an issue for this one though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants