Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

files cache with size #7846

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

ThomasWaldmann
Copy link
Member

No description provided.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 25, 2023

This is trying to prepare a bigger change in how borg works: giving up with precise refcounting and removing chunk size information from the chunks index/cache (so borg could just work with the set of existing chunks with no further information about them in the chunks index/cache).

A first step into that direction is adding the chunk sizes to the files cache (it currently only has the chunk ids).

If borg create detects a file is unchanged, we need to have that information at hand to create the item's chunk list without reading/chunking the file.

NewCache tries to be similar to AdHocCache, but additionally has the files cache.

@ThomasWaldmann ThomasWaldmann force-pushed the files-cache-with-size branch 3 times, most recently from 527b612 to aef7d7a Compare September 25, 2023 21:40
@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2023

Codecov Report

Attention: Patch coverage is 63.58696% with 134 lines in your changes are missing coverage. Please review.

Project coverage is 83.06%. Comparing base (00962f9) to head (2b27872).
Report is 1 commits behind head on master.

Files Patch % Lines
src/borg/cache.py 61.46% 117 Missing and 9 partials ⚠️
src/borg/archive.py 80.64% 6 Missing ⚠️
src/borg/archiver/config_cmd.py 0.00% 1 Missing ⚠️
src/borg/archiver/rinfo_cmd.py 66.66% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7846      +/-   ##
==========================================
- Coverage   83.75%   83.06%   -0.69%     
==========================================
  Files          67       67              
  Lines       12060    12143      +83     
  Branches     2189     2191       +2     
==========================================
- Hits        10101    10087      -14     
- Misses       1367     1463      +96     
- Partials      592      593       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

the files cache used to have only the chunk ids,
so it had to rely on the chunks index having the
size information - which is problematic with e.g.
the AdhocCache (has size==0 for all not new chunks) and blocked using the files cache there.
incref: returns (id, size), so it needs the size if it can't
get it from the chunks index. also needed for updating stats.

decref: caller does not always have the chunk size (e.g. for
metadata chunks),
as we consider 0 to be an invalid size, we call with size == 1
in that case. thus, stats might be slightly off.
when given, force using the AdHocCache.
if a chunk is missing in repo, it will also be missing in a ad-hoc
built chunks index.
thus:

- no cache.path
- skip on-disk cache corruption tests for AdHocCache
Also:
- move common code to ChunksMixin
- always use ._txn_active (not .txn_active)

Some tests are still failing.
if we use AdHocCache or NewCache, we do not have precise refcounting.
thus, we do not delete repo objects as their refcount does not go to zero.

check --repair will just remove the orphans.
Only LocalCache implements these.
NewCache does not do precise refcounting, thus chunks won't be deleted
from the repo at "borg delete" time.

"borg check --repair" would remove such chunks IF they are orphans.
NewCache does not do precise refcounting, thus it does not know about
unique chunks.

For now, let's just test num_chunks, but not unique_chunks.
NewCache and AdHocCache do not have a persistent chunks index,
so both check_cache and test_check_cache are pointless.
We can not use unique_chunks counter with NewCache,
thus we use a simpler (and weaker) assertion.
removed some code borg had for backwards compatibility with
old borg versions (that had previous_location only in the
cache).

now the repo location is only checked against the location
file in the security dir, simplifying the code and also
fixing a related test failure with NewCache.

also improved test_repository_move to test for aborting in
case the repo location changed unexpectedly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants