Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpg problem while restoring encrypted backup #409

Open
omega opened this issue Aug 29, 2018 · 6 comments
Open

gpg problem while restoring encrypted backup #409

omega opened this issue Aug 29, 2018 · 6 comments

Comments

@omega
Copy link

omega commented Aug 29, 2018

While setting up encryption of our wal-e backups in our test environment I have stumbled upon a slight problem when trying to restore it to a slightly different environemnt. I dont' know if this comes down to different gpg-versions, or something else, but here goes.

Server backups are made from (tried both GCS and file-backend):

gpg --version: 1.4.20
wal-e version: 1.1.0
postgres 10.4

Machine I attemp to restore to:
gpg --version 2.2.4
wal-e version: 1.1.0
postgres 10.5

I do the following:

gpg --recipient "<key_id>" --encrypt /tmp/tmp.18nEVROjB7
gpg --decrypt "/tmp/tmp.18nEVROjB7.gpg"

envdir restore/env /usr/local/bin/wal-e backup-fetch /var/lib/postgresql/10/tmprestore LATEST

The output from wal-e is then:

wal_e.main   INFO     MSG: starting WAL-E
        DETAIL: The subcommand is "backup-fetch".
        STRUCTURED: time=2018-08-29T06:32:46.298928-00 pid=14258
wal_e.worker.file.file_worker INFO     MSG: beginning partition download
        DETAIL: The partition being downloaded is part_00000000.tar.lzo.
        HINT: The absolute file key is home/omega/tmp/koriander-pg-backup/basebackups_005/base_000000010000011D00000008_00059128/tar_partitions/part_00000000.tar.lzo.
        STRUCTURED: time=2018-08-29T06:32:46.413433-00 pid=14258
wal_e.worker.file.file_worker INFO     MSG: beginning partition download
        DETAIL: The partition being downloaded is part_00000004.tar.lzo.
        HINT: The absolute file key is home/omega/tmp/koriander-pg-backup/basebackups_005/base_000000010000011D00000008_00059128/tar_partitions/part_00000004.tar.lzo.
        STRUCTURED: time=2018-08-29T06:32:46.420522-00 pid=14258
wal_e.worker.file.file_worker INFO     MSG: beginning partition download
        DETAIL: The partition being downloaded is part_00000001.tar.lzo.
        HINT: The absolute file key is home/omega/tmp/koriander-pg-backup/basebackups_005/base_000000010000011D00000008_00059128/tar_partitions/part_00000001.tar.lzo.
        STRUCTURED: time=2018-08-29T06:32:46.426142-00 pid=14258
wal_e.worker.file.file_worker INFO     MSG: beginning partition download
        DETAIL: The partition being downloaded is part_00000003.tar.lzo.
        HINT: The absolute file key is home/omega/tmp/koriander-pg-backup/basebackups_005/base_000000010000011D00000008_00059128/tar_partitions/part_00000003.tar.lzo.
        STRUCTURED: time=2018-08-29T06:32:46.431523-00 pid=14258
gpg: block_filter 0x0000564683c9aa80: read error (size=15700,a->size=15700)
gpg: block_filter 0x0000564683c9a060: read error (size=16187,a->size=16187)
gpg: WARNING: encrypted message has been manipulated!
gpg: block_filter: pending bytes!
gpg: block_filter: pending bytes!

the base_backup files are copied with scp from the remote server, and manually decrypting them with gpg --decrypt works on all of them seemingly without error, so it seems the files themselves are ok?

Not sure how or where to dig deeper in this, but please let me know if I can provide any additional information. I can't share this backup or the keys used, but if it's needed I can try to make another DB and backup that and encrypt it with someones key if that helps

@3manuek
Copy link

3manuek commented Oct 3, 2018

Check the status of the key (like , if it is expired). In that case, wal-e does the push within an invalid key (probably is non-encrypted).

@omega
Copy link
Author

omega commented Oct 4, 2018

as I wrote towards the end, I can decrypt the files manually with gpg --decrypt just fine using the same key, and the key is not expired (I can also see it using it when it does the push), so I'm a bit confused as to what could be causing this

@normalerweise
Copy link

I have the same issue with Azure Storage. As @omega stated he tried GCS and file-backend
it looks like the issue is not related to the sotrage backend.
I also made the obervation that using gpg on the commandline works if the file was not downloaded by wal-e

@normalerweise
Copy link

I don't fully understand the issue. However I came across: #132
In this issue it is described that the root cause was a missing filesystem level authorization: "didn't have permission to create subfolder."

@normalerweise
Copy link

I've found the root cause of the issue (my case). Wal-e basically sets up a data processing pipeline consisting of the following steps:

  1. Azure File Storage -> Download files/binary data via Azure Python SDK
  2. Pipe the downloaded files into gpg to decrypt the incoming data stream (using the corresponding gpg cli)
  3. Decompress the data stream using LZO (using the corresponding lzo cli)
  4. Untar the single tar archive into the originating postgres data directory structure (using the python tarfile module)

My backup contained a symlink, as the pg_wal directory is stored on a separate disk.
I didn't load the backup into an empty folder, I loaded it into a newly initialized data directory.
So the pg_wal directory existed already. Obviously the tarfile module in step 4 was fine with existing directory if the tarfile contained a corresponding directory, too. However it failed with an exising direcotry when the tarfile contained a corresponding symlink.
This caused the error above.
Hope this helps @omega to find your root cause

@fredrik-smedberg
Copy link

fredrik-smedberg commented Feb 16, 2021

Hi guys!

Me and @vipin030 might have found something.

The information below is based on the following system:
CPU cores: 64
Memory: 409 GB
GPG: 2.2.4
libgcrypt: 1.8.1
wal-e: 1.1.0
PostgreSQL: 11.10

When we were trying to restore a large wal-e backup (12+ TB, from a Google Cloud bucket), we discovered that setting --pool-size higher than 12 always gave us lots of decryption errors. Lowering it to 12 made the errors disappear, or so we thought. We got 1 gpg error now, exactly the kind in @omega's log above. We've verified that the file is in fact not corrupt. Will try to lower pool size to 10, restore again and try to see if it disappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants