[Snapshot Pro] Backups randomly succeed or fail

I have several sites that I recently setup on a single server that I manage. The success or failure of a backup seems to be random.

It’s very frustrating to deal with a situation like this in Support Chat. The exchanges are too slow and ineffective. I’m posting this in public so others can benefit from diagnostics with Dev Support.

None of these systems are production. They are all behind a firewall that blocks ports 80, 443, and others. Outbound TCP is not restricted. All WPMU DEV IPs are whitelisted, though the issue looks to be in the server, not after file creation when data should be in transit to Dev.

I have just updated all sites to WP v5.6 and loaded Snapshot v4 to all of them. The Ubuntu server running Apache is fully patched and has been restarted after all config changes.

Note from below: PHP memory has been allocated. WordPress memory has been defined. Snapshot v3 settings for chunking were tried with no success. Snapshot v3 was removed from most or all sites. The server has 2 CPUs and 4GB of RAM. Neither CPU nor RAM are maxed, even with some number of simultaneous backups in progress.

PHP.ini:
memory_limit = 512M
max_execution_time = 600
upload_max_filesize = 5M
post_max_size = 5M
file_uploads = On

wp-config.php
define( 'WP_MEMORY_LIMIT', '128M' );
define( 'WP_MAX_MEMORY_LIMIT', '300M' );
define('SNAPSHOT4_FILE_ZIPSTREAM_LOG_VERBOSE', true);
define('SNAPSHOT_FILESET_CHUNK_SIZE', 20);
define('SNAPSHOT_TABLESET_CHUNK_SIZE', 20);

Snapshot has been configured for verbose logging, but the output is not verbose. As an example, these three lines come in-sequence from a .log file. Nothing has been removed. There is no more detail between these three lines in the log.

021-01-19 12:17:04 -08:00 [info] Snapshot has completed sending back the filelist to the API, so we're ready to begin the actual backup of the files.
021-01-19 12:17:07 -08:00 [error] The backup has failed to complete. The API responded with: snapshot_failed_SiteNotRespondedZipstreamError
021-01-19 12:17:15 -08:00 [info] Communication with the service API, in order to retrieve snapshots, was successful.

I opened a call yesterday on this. I have opened Support Access to six sites for Dev Support to look at successful and failing backups and their logs.

Let’s find out what’s going on here and resolve it so we can confidently rely on this great tooling. Thanks!