Network file transfer with on-the-fly compression

We often transfer large number and large size files over the network from one computer to another. FTP is the default choice for  transferring few files and SCP is the typical choice for transferring large number of files.

If you happen to transfer files from one computer to another over a slow network(such as copying files from home computer to office or vice versa) then the following tip might be helpful. This technique works as follows:
1) Performs on-the-fly compression of files at source computer.
2) Transfer the compressed files over the network.
3) Performs on-the-fly decompression of the files at the target computer.
This technique uses just SSH and TAR commands without creating any temporary files.

Let us assume source computer as HostA and target computer as HostB. We need to transfer a directory (/data/files/) with large number of files from HostA to HostB.
1) Command without on-the-fly compression
Run this command on HostB
# scp -r HostA:/data/files /tmp/
This command recursively copies /data/files directory from HostA to HostB

2) Command with on-the-fly compression
Run this command from on HostB
# ssh HostA “cd /data/;tar zcf – files” | tar  zxf –
This command recursively copies /data/files from HostA to HostB a lot faster on slow network.

Let us take a  look at this command in detail:
1) ssh HostA “cd /data/;tar zcf – files” | tar  zxf –  :
From HostB connect to HostA via SSH.
2) ssh HostA “cd /data/;tar zcf – files” | tar  zxf –  : On HostA switch to directory /data/
3) ssh HostA “cd /data/;tar zcf – files” | tar  zxf – : Tar ‘files’ directory with compression and send the output to STDOUT.
4) ssh HostA “cd /data/;tar zcf – files” | tar  zxf – : Pipe(|) STDOUT from HostA to STDIN of HostB.
5) ssh HostA “cd /data/;tar zcf – files” | tar  zxf – : On HostB decompress and untar data coming in through STDIN.

To show how useful this technique is, we transferred 45M worth of files from HostA to HostB over a DSL connection. Here are the results:
1) No compression method: 12min 59 sec
2) On-the-fly compression method: 2min 33 sec

This method will be effective with uncompressed large files or directories with a mix of different files. If the transferred files are already compressed then this method won’t be effective.

5 Responses

  1. thrill says:

    Have you tried using the -C flag to ssh?

  2. $mike cremer says:

    Note that this is to likely to help with media files, e.g. videos, music, jpegs.

  3. $mike cremer says:

    Sorry, I meant NOT likely to help. I.e., media is already compressed (that’s the point of JPEG and MPEG encoding).

  4. Greg says:

    That’s interesting information. We do on-the-fly compression with our proprietary software, and have always recognized the benefits; however, I myself have never set up the tests, so I never realized how dramatic the time savings could be. Since our stuff is UDP-based and already “accelerated,” I wonder if we’d achieve the same results. I should get QA to set up an A/B!

    As Mike mentions, previously-compressed media won’t benefit, though– you could always add some sort of filter to a script such that any files with known-compressed extensions are sent without compression, then use your information to send anything else outside the filter. Not technical or proficient enough to give you an example, but I can’t see why it wouldn’t work in theory.

  5. Antriksh Pany says:

    ‘rsync -z’ is also a worthwhile alternative.

    Also, apart from ‘scp -C’, you can play around with
    scp -o CompressionLevel=[1-9]
    to achieve the desired compression level. This is a useful things to tweak if you want to achieve a reasonable balance of compression time vs network transfer. In general, for higher network speeds and lower CPU power, you would want to use a lower compression level (and vice-versa).

Leave a Reply