User Tools

Site Tools


information-technology:2019-file-sync-and-backup
                      change site colors:
                      

2019: File Sync and Backup

I'd rather pay for online storage. I could have NAS Network Attached Storage or a server, but I don't want to maintain additional network hardware. I want to own less things.

File Management Strategy

You can have all your files on a server, and work with them remotely from all your devices. The server takes care of backup with file versioning.
Pros:
- Easy maintenance
Cons:
- Slower file access
- No access offline

You can sync your files to all your devices, including a copy on the file server, which provides backup with file versioning.
Pros:
- Fast access to files
Cons:
- Overhead in terms of system resources (especially CPU usage for encrypted file transfers)
- Higher file system maintenance on devices

A compromise is to have all the files on the server, and sync only sets of files you use frequently. Also, you can mount the server file system as a network drive. This allows you to quickly browse/search for a file as if you had all of them locally. You can also use your file manipulation tools/programs, including the ones in the windows file explorer context menu, as if the files were local. However, treating server files as if they were local requires downloading the files you are processing, as a background process. Downloading files as a background process can take a very long time, unless the number of files is few, so this may not always be practical.

Choose your poison.


NetDrive

Below I tested the speed and reliability of uploading my files with a combination of NetDrive and other programs. NetDrive mounts Google Drive storage as a network drive in Windows. My testing has been with:

NetDrive 2.6.16
Windows Explorer 2600
FreeFileSync 8.6
ViceVersa Plus 2.4.2
Roadkil's Unstoppable Copier 3.12 and 5.2
Supercopier 4.0.1.4
RichCopy 4.0.217

software combo time stamp cache delay
netdrive + windows explorer yes yes no
netdrive + freefilesync no yes yes
netdrive + viceversa yes yes yes
netdrive + supercopier yes yes yes
netdrive + roadkil yes yes no
netdrive + richcopy yes yes no

There seems to be a delay between each file upload, in the programs that are not Windows Explorer or Roadkil's unstoppable copier. The programs appear to want a confirmation that the file has been copied, which causes a delay. The result is that Windows Explorer and Roadkil copy files 2-5 times as fast as the others. If it's just one huge file it's not a factor. If it's many tiny files, the delay between each file transmission makes the upload slow as molasses.

While using Windows Explorer for uploading files, in rare instances I get an I/O error, and the copying terminates unfinished. Due to the upload being incomplete, I had to figure out what had and hadn't been copied using ViceVersa. Also with Windows Explorer, the copying was interrupted due to a notification of stream (metadata) loss, which prompted for confirmation if I was ok with that. Apparently, my previous use of DropBox has added some metadata to my files, that I don't really care for. The notification happened while I was sleeping. Roadkil's unstoppable copier is the right tool for this job.

Update: I started using Richcopy, which is multithreaded and optimized for the high latency associated with network use 1. Although in this case there is even more latency as it's the internet instead of the LAN. Although I get a lot of errors with the string “the system cannot find the file specified”. So it works faster but it's still a pain. At least the job continues if there is an error.

How faster is Richcopy? At least 3 times as fast as Windows Explorer. In the graph below, Windows Explorer and RichCopy were going at the same time, with the same amount of files, in number and size (to be fair, I should have tested each without the overlap in time, but… meh). The graph does not represent the full amount of time, but it does show how much more effective Richcopy is at using all the bandwidth. After Richcopy finished, the amount of green became scarce, as Windows Explorer chugged along one file at a time.

Also note that Google seems to cap my upload at 1.5Mbps. I'm really envious of people who are getting 57mbps upload. That's not possible for me as I have 10mbps upload bandwidth limit (Ookla Speedtest). Though I don't think my bandwidth matters if I can't get around Google's cap. Or could it be that my ISP allows a higher upload bandwidth for the speedtest than other sites?

I couldn't find decent documentation on recommended settings for Richcopy, so I winged it and set my own below. I first tried “file copy t.c.” = 4, and “directory copy t.c.” to 30, which produced the results above (left side). To be confusing, “directory copy t.c.” means the number of simultaneous files being copied, and “file copy t.c.” means breaking up a file into several parts.

Later on, I tested RichCopy vs Goodsync, and I was seeing delays in the file copy progress of both programs. The job had many many small files. I decreased “file copy t.c.” to 1, and increased “directory copy t.c.” to 50, but still there were delays. Something about my setup is easily overloaded with simultaneous connections? Maybe the Wifi? Maybe the Netmeter producing the above chart? Maybe Netdrive, as it becomes a long wait to right click on a folder in the network drive during file copying. Although that wouldn't explain Goodsync's delays.

My conclusion is that if there are many many small files, RichCopy and Goodsync cannot fill the available upload bandwidth because they can only initiate new uploads at about two per second. This may be a Google thing. This keeps the simultaneous upload number down, because the uploads are completing faster than new ones can initiate.

RichCopy settings I settled with:

directory copy t.c. 30
file copy t.c. 1
file copy c.s. 1024
search t.c. 20
search c.s. 300
process prio 2

While I won't be using FreeFileSync due to lack of time stamp transference and slow copying, I do like the differencing-style GUI, and that it can do file versioning. GoodSync also has the differencing-style GUI, which allows me to preview what will happen in a sync, before I allow it to happen.


GoodSync

GoodSync is multi-threaded, uploading several files at once. Goodsync's multi-threading can be found in “job options”. I set it to 40, and have the results in the image below. It works on it's own, not related to NetDrive. The scale in the graph is 1.5Mbps, while the previous graph has a 3.0Mbps scale because of some download spikes. Both Richcopy + NetDrive and Goodsync are comparable in performance, delivering the solid fill of bandwidth you see below.

With Goodsync however, there is a delay before copying starts, first while it analyzes what it will do, then again while it manages its state files. Goodsync differs from RichCopy, in that it keeps a database to track it's work, so that it will intelligently know if you've renamed a file or folder, and not recopy everything to make a mirror on the online storage. At least that's a claim I hope is true. “GoodSync detects file/folder renames and executes them as Move commands.” Feature list here.

I can only hope the initial delays reduce after Goodsync has state files in place.

I've had issues with GoodSync + Google before a few years ago, but tried it again because it integrates the differencing GUI, the online storage connection, the multi-threading, and the file versioning, all-in-one. Basically everything I want. If it's well programmed, then the integrated solution is optimal because the parts were designed to work together.

No error messages with Google Drive like I had before. Maybe because I'm using the latest version now, or maybe because the Google Drive API has improved. I need to check with the older version. The reason I was using an older version before, is that the newer version would BSOD (crash) my system the second I tried to install it.

I got around this, by installing Goodsync in Sandboxie. Then I copied the program files out of the Sandbox, and it would run ok sometimes, but BSOD my system launching it on other occasions. I got around that, by renaming one of the Goodsync program files: gs-runner.exe to gs-runner.exe.bak (I don't need it's functions).

I'm starting to get a little suspicious. Still I persist. I was getting a BSOD with a different message than before, but more randomly, not seemingly as a cause of Goodsync itself. I decided it had to be some combination of NetDrive, Goodsync, and the traffic monitors. NetDrive was having issues losing its credentials with Google, causing me to have to reconnect it to my account.

I tried a different traffic monitor named Networx, and Goodsync and Networx simultaneously locked up, although after 10 seconds or so they closed gracefully. I restarted my system, and tested running NetDrive with RichCopy and the traffic monitor Netmeter (afraid to touch Networx). At the worst, NetDrive made Windows Explorer hang for about 5 seconds. No BSOD, and I predict that all future use of NetDrive to be reliable like this.

Sandboxie is a program that is made to keep programs from doing bad things to your system. I think I remember I couldn't get Goodsync to run in the sandbox? I tried running NetDrive in the sandbox, and it performed normally. I tried Goodsync again, and it launched! Perhaps because of me renaming its evil file “gs-runner.exe”? So now Goodsync is happily herding my files to the Google Drive. I'm hoping this is my working solution. Is Goodsync actually virtuous, if it needs a sandbox to behave?

3 days later, no BSOD while running Goodsync within a Sandboxie sandbox, concurrently with NetDrive, Netmeter, and RichCopy.

Comparing time to completion for analyzing the differences between two sets of folders to sync, GoodSync and ViceVersa+NetDrive take the same amount of time. ViceVersa (version 2.4.2) set to “file size and time stamp”, and it is only logical that GoodSync is doing the same.


WebDrive vs NetDrive

I decided to give Webdrive a spin. Version 16 still has the issue of version 13, where the time stamp is not preserved when using ViceVersa. They still haven't fixed the issue!

I tested the speed of folder comparison between WebDrive and NetDrive, using two simultaneous instances of ViceVersa set to “file size and time stamp”. NetDrive finished in half the time. Whoa!

Although, starting up, NetDrive loads a lot slower than WebDrive. Also I've had some weird stability issues with NetDrive, that I can't pinpoint yet… could be something else? Willing to investigate further for the speed.

Despite the NetDrive3 website saying: “If you use Windows Explorer when copying files to remote storage, NetDrive does not use write cache because there must be whole file.” On my system, Windows Explorer file copying also uses the NetDrive cache. The cache is only used one file at a time. After a file is finished transferring, it becomes a zero byte file. You end up with hundreds of thousands of zero byte files in one folder. Also, the NetDrive cache should be as big as your biggest file, or the upload will not succeed. For example, I have an OSX iso that is 7GB.


Google Drive API and Service

A feature I desire, is to not have forced encryption or compression of my files before transmission, to reduce system resources demanded from my laptop. It's the reason for some of the unfinished entries in this article: I was looking for an enterprise solution to control all the settings of the server. However, NetDrive2 (current release is NetDrive3) and GoodSync don't appear to do any compression or encryption for file transmission, in conjunction with Google Drive.

This conclusion is based on the lack of CPU usage during transmission. For NetDrive2, I couldn't find any reference on the web regarding file encryption in conjunction with Google Drive.

In place of encrypting everything, you can use solutions for certain files rather than all of them. Libre Office has password encrypted documents, 7zip can password encrypt+compress sets of files. Empathy lets you password protect executables so you can only run them after entering the password. I am willing to give up the privacy factor, when it comes to the files I choose to leave unencrypted.

Part of the importance of encryption is privacy:
https://www.zdnet.com/article/no-privacy-on-amazons-cloud-drive
https://mspoweruser.com/watch-what-you-store-on-skydriveyou-may-lose-your-microsoft-life

Having subscribed to the Google Drive 200GB service, I get Google One support. The chat support is excellent. Wait time has been 10 minutes once, and 30 seconds the next time. I'm on the computer anyway, so the short wait isn't bothersome.

The reason for the request for support, is that I wanted to use a new google account that I created with a temporary phone number. The situation is that even after I deleted the number, they still want verification via the deleted number. Mind you, they wanted verification, not when I used the credit card to set up the 200GB Google Drive. They wanted verification when I tried to connect NetDrive to Google Drive (“suspicious activity”).

Google support manually allowed NetDrive to connect despite me not having the phone number anymore, but in order to get rid of the number from my account, they “escalated” my case and I'm waiting to hear back between 24 to 48 hours. Almost a week later, I have yet to hear back from them, although an agent said he'd get back to me personally to give me an update.

Meanwhile, WebDrive connected without any intervention. An old version of Goodsync didn't connect, ever. Every time, Google's bot disallowed it because of “suspicious activity”, “please verify your account”. However, when I got the new version of Goodsync running, it connected without issue.


Duplicati

(under construction)




Nextcloud

I've been using Nextcloud, installed on my web hosting server, for the sole purpose of syncing passwords between my android phone and Pale Moon browser on Windows. Nextcloud aims to be collaboration software, rather than file sync, but I've been using it as such. It's super easy to install. My preference is to dedicate the root of a subdomain for this purpose, so I follow these steps:

  1. Download the version of NextCloud you want to install (usually the latest version)
  2. Upload it to the server.
  3. Extract the files.
  4. Move the files to the root of the subdomain.
  5. Open web browser to the root of the subdomain: http://cloudindasky.example.com
  6. Choose a name and password for the admin account.

Then you do this on your machine to sync files with the server.

In June of 2019, I got an error message, saying that a file on the server was locked. I found this page, which gave me a workaround. In the case of using shared hosting, where I can't install a more sophisticated database sharing solution called Redis, I should just disable file locking.

“…edit your configuration file config/config.php:
'filelocking.enabled' ⇒ false,

Now I have to backup my database in case of corruption? It's unlikely my phone will try to write to the database at the same time as my desktop, considering it's just me.

I try to avoid databases as much as possible. I really like Dokuwiki's non-database solution (this site runs on Dokuwiki). If a dokuwiki page has changed, I get a message telling me to keep the change or overwrite with my current edit. In Windows, I am using a file unlocker all the time, without ill effect. Although sometimes I am unable to unlock a file. I wish I could disable file locking in Windows. On few occasions I can't unlock a Windows file, and I have to wait until next reboot in order to delete it. I don't know the consequences if I was able to delete it against the system's wishes.

While I use Nextcloud on my web hosting server, I don't use it for file backup. The terms of my web hosting account, while offering “unlimited space”, is not meant for file storage. I'm paying like $2.50/mth.


NextCloud Connect to Google Drive

I wanted to connect Nextcloud to Google Drive, to be able to see the contents of Google Drive via NextCloud. I guess this may be useful in the future, if I want NextCloud to sync files between two online accounts, without my laptop as intermediary.

In order to accomplish NextCloud → Google Drive, I needed to install version 13 of NextCloud. On newer versions 14 and 16, the apps page wasn't loading. I posted the issue to the NextCloud forum. So with the apps page loading in version 13, I followed the following steps:

  1. Go to NextCloud → ProfileIcon → Apps → Files, enable the “External storage support for Google Drive”
  2. Go to https://console.developers.google.com → Credentials → Create Credentials → OAuth Client ID → Web Application → Create.
  3. Follow these dated instructions to get a copy of the Client ID and Client Secret.
  4. Go to NextCloud → ProfileIcon → Settings → Administration → External Storage (2nd one)
  5. Make up a name for the folder, select Google Drive, select OAuth2, enter the Client ID and Client Secret, add access to admin and yourself, click save.

Now click files icon on top left, and see that you can't access Google Drive, because the server doesn't have an smbclient module installed. The The PHP smb client module is preferred. OMG, it's so easy!

So, on Hostinger's shared hosting, they don't give you the PHP smblient module. You have to get their VPS plan, which I have been trying to avoid, for the sake of simplicity. Yes, I know how hard that is to believe, that I would want simplicity.

Connecting via Putty, I tried to follow instructions on how to install a php extension on a shared host, but found that the gcc command was unavailable.


2019: Servers and Object Storage

Everything below this has no finished outcome/solution, but can be useful as reference. I left a question over at serverfault.com. Having no response, I had to provide my own sort-of-an-answer.

Motivation for a Virtual Private Server

I want to configure my own server, so I can have granular control over the settings. I want to tone down the encryption/compression so that communications with the server is easier on my laptop's system resources. Chances are you have a speedy machine and don't care. You would be happy with the defaults of some other commercial service for that purpose (like BackBlaze, SpiderOak, or Carbonite). However, if I figure out how to get this done, and give you easy instructions on how to set up your own server, then you will have a choice of using the commercial option. It might be better than consumer options? I don't know enough to answer to this question.

Update 20200424: To get around problems of object storage described in the following sections, one could create a persistent disk for a VPS. Google for example: https://cloud.google.com/compute/docs/disks#pdspecs. There is also Google Filestore, which “provides a consistent view of your filesystem data” https://cloud.google.com/filestore


Cloud Storage File System

Online Storage Services use Object Storage, as opposed to file systems which manage data as a file hierarchy. With Amazon S3 and Google Cloud Storage, there is no transparency: you have to deal with the differences. Even with Google drive, when I deleted a few gigabytes of data all at once, I saw files lingering around that were supposedly deleted. Kind of makes you question your sanity.

Take a look at what Webdrive says about Amazon S3:

” - S3 does not have a concept of folders, but WebDrive uses the S3 prefix/delimiter option to make it appear as folders with '/' being the delimiter. This allows you to create folders with WebDrive and put files into them.
- S3 does not support a rename operation. So to rename a file WebDrive downloads the file then uploads it to the new name and deletes the original. This means you can not rename a folder that has files in it.

Remember buckets are global in S3, so after mapping a drive to the S3 service. You should use the DOS prompt to create the folder and not use explorer if created a bucket/folder at the root level since it's global. Something like “New folder” will fail because somebody else in the world has already created one. So open a DOS prompt and enter something like “MKDIR my-name-that-is-sure-to-be-unique” perhaps having your name, or some numbers in them. Something like “Docs” will fail. “


Preserving Modified Date for a File using Buckets

Netdrive: https://support.bdrive.com/t/the-created-modified-date-of-files-seem-to-spontaneously-change/53156
Unfortunately, the protocol sftp doesn’t support SetFileTime for a file in remote storage.
It’s not supported by sftp as well as webdav.
Some ftp server suuports it.
GoogleDrive or box support it.

Cloudmounter: https://www.eltima.com/forum/index.php?topic=128373.0
I've received an update from our developer regarding this matter. First of all, let me clarify that not all servers are able to give the date parameter.
With AWS however, the creation date is not transmitted on any server. This is a server side configuration and, unfortunately, we cannot affect this from our side.

S3: https://forums.aws.amazon.com/thread.jspa?messageID=738308 Currently S3 does not support customizing the Last Modified value, as per the following documentation
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
However, one way you can achieve this would be to use the AWS CLI to store this information when using the 'aws s3api put-object' command to upload files and appending the following argument:
–metadata (map) A map of metadata to store with the object in S3.

Cyberduck: https://trac.cyberduck.io/ticket/6441
Preserve modification date does not work with Amazon S3 to Preserve modification date
Type changed from defect to enhancement
Setting timestamps is not currently supported in S3 because the API has no such feature per se. We could add our own proprietary metadata to the uploaded files to store the modification date.
Resolution set to wontfix
Status changed from new to closed

Using metadata to store modified date in google cloud storage2
Command line tool for linux that works2


File Sharing Protocols

WinSCP provides a good comparison of supported file system features using the communication protocols WebDAV, SFTP, and FTP. The way I understand it, the focus of WebDAV is for user collaboration, while SFTP focuses on providing a remote file system.

According to WinSCP, SFTP is slow because of the SSH encryption, and because of the need for packet confirmations. If I am setting up my own server, I could possibly configure SFTP to run with a lower encryption level and bypass confirmation packets? There is some speculation that it is the WinSCP implementation of SFTP that is the problem. Not finding consistency on the web leaves me unsure about which protocol to choose. This topic has been a rabbit hole for me, for years.


Amazon Throughput Rate

” …sc1 uses a burst-bucket model for performance. Volume size determines the baseline throughput of your volume, which is the rate at which the volume accumulates throughput credits. Volume size also determines the burst throughput of your volume, which is the rate at which you can spend credits when they are available. Larger volumes have higher baseline and burst throughput. The more credits your volume has, the longer it can drive I/O at the burst level.

For a 1-TiB sc1 volume, burst throughput is limited to 80 MiB/s, the bucket fills with credits at 12 MiB/s, and it can hold up to 1 TiB-worth of credits. “ source

I conclude that for my meager 150GB, my base throughput rate will be ~1.6 MiB/s, and I'll have a burst rate of ~11 MiB/s. The burst rate will last for a maximum of 150GB. It will accumulate towards the burst rate at 1.6 MiB/s. I don't think I will ever drop from burst rate!


VPS via Command Line

Do I need to install a linux sftp server2 to make use of its file system? Much as I want to avoid having to deal with maintaining a server, the downside is having to be at the mercy of proprietary solutions (trials and tribulations below). However, taking a quick look around, a VPS with the 150GB of storage I want is expensive, starting at about $30/mth.

Google Cloud Storage, and probably Amazon S3 as well, allows for a free weakling-level VPS. The one at Google gets a whole one fifth of a CPU core.

The server would be using FUSE to simulate a hierarchical filesystem on the mounted buckets (similar to what WebDrive described above). The following is a technical overview from google:

“Cloud Storage FUSE works by translating object storage names into a file and directory system, interpreting the “/” character in object names as a directory separator so that objects with the same common prefix are treated as files in the same directory. Applications can interact with the mounted bucket like a simple file system, providing virtually limitless file storage running in the cloud. … While Cloud Storage FUSE has a file system interface, it is not like an NFS or SMB file system on the backend.” source

What is better? To have a local FUSE software interact with the buckets on the server, or to have FUSE software running on the server, that interacts with its own local buckets? Based on what I've read recently, running commands over http is not ideal because http wasn't designed for this purpose, and I think I understand that both SFTP and WebDAV use the http protocol to carry out their tasks?

Let me think of an example: if I were to rename a folder with 1000 files, then this would mean 1000 file rename instructions going to the server. Commands traveling over long distances. My experience with the internet, is that sometimes packets get dropped. I think FUSE happening within the IAAS would be advantageous.

The VPS interacting with the buckets is no different than my own machine doing so, except that:

  • the physical distance from the VPS to the bucket storage may be less and require less hops
  • the FUSE implementation may be superior on either the locally running software, or the VPS server




VPS with Web Based GUI

Duplicati

Syncthing

” I really like nextcloud - it's a lot better than manual sync or half-automated SMB-stuff - using it for CalDAV and CardDAV currently, but I personally sync my files with syncthing.

Nextcloud has a nice interface and provides good access control - however I experienced the sync to be really slow (and buggy using an android device). Syncthing is really fast and has a really low performance impact. It's worth a look. “ source

The free and open source Syncthing runs both on your computer, and also on the server. Versions are available for both Windows and Linux.

The efficacy of running Syncthing on Amazon EC2 needs to be verified, as the posts in this forum are inconclusive. This video on youtube shows the ease of using the Syncthing interface.

Working with a VPS at Amazon EC2 or GCS Compute Engine, you don't have GUI. You only have the command line prompt. You create your own GUI for working with the VPS, by setting up access to programs like Syncthing and Duplicati. These programs run in a web browser.

One deficit of Syncthing is that it doesn't have the option to display the results of the sync before it occurs. The alternatives I have to offer, only run in the local operating system, not over the web: freefilesync or vice-versa. I'm afraid sync can sometimes do something you don't want, often due to user error (making assumptions about what it will do, rather than what it actually does). With sync that doesn't show you the results before executing, best practice would be to first make an offline backup, and second to practice with a small set of files.

Syncthing will not mount the remote file system as a network drive, but rather a folder (much like dropbox). You can use symbolic links and junctions.

To check Syncthing's results and to experiment with different methods, you can also mount the remote storage using WebDrive, NetDrive, etc, by running an SFTP server on the VPS, or more easily by connecting directly to the bucket storage. It's fine to concurrently run Syncthing and mount the remote files, but better to not to be writing to the same file sets at the same time.

In the past I've had issues using vice-versa with WebDrive and Google Drive. Integrated solutions, or solutions that have been thoroughly tested to work together, are less likely to have bugs.

Running a bare-bones virtual machine (VPS) on either Amazon or Google is free so long as it stays within its resource limits. You would only be paying for Amazon's Elastic Block Storage (EBS) or Google's GCS Standard Persistent Disk. The EBS or Persistent Disk would be mounted on the virtual machine (VM) instance. Comparison of terminology

Syncthing uses Block Exchange Protocol. So not SFTP, nor WebDAV, nor SMB, as I was considering before.

Amazon: $0.025 per GB-month of provisioned storage (sc1)
Google: Standard provisioned space $0.040 Price (per GB / month)

For 150GB, Amazon is $3.75/mth. Compare that, not with GCS, but Google Drive, which offers 200GB for $3/mth. However, I want control of the server.


2018

In the past I was somewhat confused about the big picture because I was wrestling with the details, much the way I still am. Following these links is preferable to reading the rest of this article:
https://www.backblaze.com/blog/sync-vs-backup-vs-storage
https://www.cloudwards.net/how-to-set-up-a-cloud-network-drive

Doing your own file synchronization, like with FreeFileSync, or ViceVersa, takes a lot of work, especially if you are always organizing your files, and doing so on more than one computer. It's better if you can use automatic sync, or work off of a network drive. While doing a manual sync, for example, looking at differences, you won't be sure if you want to delete certain files: you're not sure if you moved them to another location while organizing. If you don't delete, you will end up with duplicates. I'm thinking it may be better to leave merging to an online backup/sync service. As a bonus, many of these services provide versioning.

If you have a certain amount of files that goes over the quota of the service, you need to create an archive of old, stagnant files. Store the old, stagnant files outside of sync. You can make your own external backup of these files. Even if you are within quota, having less files in the sync will make for less overhead resources. From my experiences in 2016, it's best to have the least amount of files possible for online backup services. It doesn't matter if Mega gives you 50GB, or Google Drive 100GB for pocket change. Don't use that much!

In general, it's better to have a lesser amount of files. I used to think that in the virtual world of computers, you could have practically unlimited amount of stuff (unlike the real world where I want to have as little as possible). Well, keeping less stuff also applies in the virtual world. Keeping files requires maintenance.

I'm currently using Mega (megasync). Things I like about Mega:
- It can be made portable (doesn't need to be installed)
- It's outside of the United States, far away from local government (law enforcement, NSA, etc, are like hammers, and to hammers everything is a nail)
I think all the services are rather comparable: I can't say one is that much better than the others. Mega is lightweight on the system as long as your amount of files is low. I believe the larger part of the reason Mega can consume so much system resources (as with any of the services), is that file encryption is CPU intensive. I wish there was a way to disable, or reduce the file encryption. I'm just not big fish for any fisherman out there.


2016

I have figured out how to classify the services available, which makes it much easier to know what I want. I have labelled them as follows.

  • services: online storage services, that have sync capability via an api (application program interface), and/or their own sync program
  • clientside: sync programs running on your computer that works with aforementioned services via api.
  • serverside: an online service with unified sync to multiple storage services, where the server handles aforementioned services via api.

Of the three, I think serverside could be most favorable, because there is less work for your computer to do. Your computer only has to interact with one service, and the service (aka server) takes care of all the other services (such as Google Drive, Dropbox, etc). So why use multiple services, instead of just one? To compare them, of course! They have different features. Also because if one online service closes down, it's easy move files to another. Google definitely drops services at whim.

  • Examples of services are Google Drive, Dropbox, Mega, etc.
  • Examples of clientside are Insync, Goodsync, Webdrive, etc.
  • Examples of serverside are Koofr, Odrive, Storage Made Easy, Cloud Combine, and Air Explorer.

Most people will be fine with any free account, unless they have a lot of files. I have around 160GB. It is not just a matter of finding that much space. With all the encryption and compression and file comparison that these sync programs do, your system resources take a hit. Dropbox states that their application will slow down your computer at around 300,000 files. I have ~200,000, and with my old computer, it becomes unresponsive. Most of my space is filled with software, especially with all the remastering of Windows that I have done.

I can't help but recall, when I was using Carbonite between 2007-2012, that it could handle backing up all my files, with an acceptable amount of a toll on system resources. This occurs after Carbonite has caught up with uploading all the files currently on your computer. It also occurs with Dropbox. The trick is to leave your sync or backup program running when you aren't at your computer. Once the sync has caught up, then you can leave the sync program running all the time.

In all these services and applications, I wish there was an opt-out option, for encryption/compression, in order to use less CPU and memory. Only a select few of my files would need any security at all.

I have experienced one exception to the high resource usage of a sync program confronted with a mountain of new files: Mega. It was averaging only 10% of my CPU, and used much less RAM than Dropbox. Thank you, Mega.

The following sections should be read in reverse order, to be chronologically correct.


20160420 Dropbox, Mega, Webdrive

Dropbox brings my computer to a halt: it consumes CPU and memory like crazy (not only in its own process, but explorer.exe as well!). I'm hoping that once it catches up on uploading all my files, that it will stop being a pain. I have even moved all my files, into their folders, rather than using symlinks. They may say it is 1TB (I'm using business trial account for a month), but that's a joke if the desktop app can't handle too many files.

I had been waiting to hear from anyone at Dropbox, after many requests for help. The result was a useless canned response, and a few days later, they announced to drop support for Windows XP. I watched in FileMon, as it kept wastefully looking up a Desktop.ini file at the root of the dropbox folder, more so than actually doing anything useful, like syncing my files. Almost seems like the high resource usage is intentional? I proceeded by denying NTFS ACL access to that file.

To make it more useable, I split up my files to less than 50GB per account. So I currently have one business and one personal, that are tied together and I can use via the desktop dropbox app. Then another business dropbox account, for which I send files through Webdrive, and then, I have a Mega account.

Mega has behaved stupendously well. It just works.

I have been trying to deal with Webdrive's inability to provide the original time stamp for uploaded files. This is the case, in Dropbox, and in Google Drive. After communicating with Webdrive, I found out that they do support time stamps on Google Drive, but not on Dropbox. The old version of Vice Versa I was using, somehow was incompatible at maintaining time stamps with Webdrive + Google Drive. I am still a little mistified about this, considering Webdrive is the last one to touch the file on the way to Google Drive.


20160412 Dropbox, Boxifier

I am experimenting with Dropbox. I hope that I can just set it and forget it. Except I am learning that these programs in general, are limited in the number of files they can handle. Symbolic links seem to be working, at least for me, such that I don't see what the hype is about the addon for Dropbox called Boxifier, except for use with external drives and network shares. Dropbox recommends moving your files into the dropbox folder, and creating symlinks to replace them, but I am doing the opposite, because I may use other sync programs/services as well. Meanwhile, I also have a backup on an external drive, to recover from stupidities from misbehaving sync apps, and my own errors.


20160409 Webdrive, Insync, Goodsync, Google Drive, Dropbox

Webdrive support told me which registry value (AuthCode) to put the secret code from google. So now with Webdrive, I can mount google drive as a local hard drive! Follow these instructions:

1. Run regedit and check the settings for your site. See if it has a “AuthRefreshToken”. If so, wipe it out to be an empty string. (HKEY_CURRENT_USER\Software\South River Technologies\WebDrive\Connections\< your site name>

2. Make sure the “ServerType” setting for that site is set to 8.

3. Then open browser to this URL.

4. This will give you the code to copy/paste. Take the code and paste it into “AuthCode” in regedit, and then you can connect to the server. This auth code can only be used once so you can't use an old auth code - you have to create a new one. Once you have this setup and connected then you shouldn't have to do this again.

It is incredibly slow, but not as slow as Insync. It is the nature of the beast, that it is not an actual local hard drive. I think Goodsync is faster because it is multi-threaded. I think I can accomplish the same in Webdrive, by manually having multiple transfers going at the same time. In the settings for a Webdrive connection, you can specify how many simultaneous connections are allowed. At the moment, I am using my own favorite file comparison tool, Vice-Versa along with Webdrive. It is nice to have some familiarity when working with a new environment: I can check to make sure the backup on Google drive matches my local files, when using Goodsync or other sync programs.

Goodsync has been giving me errors. I believe this has to do with the poor quality of Google Drive's programming. I can see how third party apps that interface with Google Drive, would also have a hard time. Once upon a time, google could do no wrong. I suppose that was a time when nerds were in charge. In a large organization, it tends to happen that people who are good at manipulating others tend to climb the leadership ladder, rather than the nerdy ones that focus on technology.

Newer versions of Goodsync may have fixed the errors I was having (it said so, in the release notes), however, newer versions also cause my system to BSOD. In fact, I had to access the system offline, in order to keep Goodsync from running, in order to get back to business. Goodsync said I probably have bad drivers, though I'm not sure which drivers they may be referring to.

Webdrive has been pretty reliable with Google Drive, but it can't maintain the modified date time stamp of the original files. Files on Google drive, have only the date and time they were uploaded. Update: it was the use of the old version of Vice-Versa that didn't maintain the file-stamp. Goodsync maintains the modified date stamp on the file.

With “automatic” file sync programs like Insync, I don't know what kind of sync they are using. I liked that Goodsync was more like a file comparison program, but when there were errors, there wasn't much I could do. Having Google Drive mounted as a hard drive with Webdrive, I can use my favorite tools to investigate and fix the errors. At the moment, I'm using an old free version of Vice Versa, and SuperCopier (The better versions developed by SFXTeam, for XP and Vista/7, see French Wikipedia article for reference).

Thanks to adding SHA-2 to my operating system, I also got to try out Google Drive. Google Drive desktop client was horrible. I can't begin to tell you how unreliable it was in my sync testing. Also, I was expecting it to show up as an actual drive. It is just a sync folder like Dropbox. The difference is, that Dropbox works, and also supports symbolic links.


20160405 Koofr, Webdrive

I tried Koofr. At first it didn't run on XP SP2, because it lacks SHA-2. So I dropped SP3 files crypt32.dll, and rsaenh.dll into the system32 folder, and that did the trick. I found out, that I could only sync a local folder on my computer with Koofr, not with Google Drive (by design). So there went the whole idea of having Koofr be the intermediary between a bunch of storage services.

Although, Koofr has the feature that it allows you to sync between cloud services.

The XP SP3 files crypt32.dll, and rsaenh.dll, almost got Webdrive to work with Google Drive. I had to copy the URL from the ieframe to “internet explorer”, which gave me the key to paste into the Webdrive program. Unfortunately, there was nowhere to paste the key. I see that the connection info is stored in the Windows registry under: HKEY_CURRENT_USER\Software\South River Technologies\WebDrive\Connections\google drive\. I send a support request to Webdrive, and hope for the best. I don't really want to apply SP3, as I am averse to software bloat.


20160324 Google Drive, Insync, Goodsync, Owncloud

I could have just installed Google Drive, and used a windows based file sync program. But I didn't. Why? Because I'm still using Windows XP, and Google no longer supports it. Specifically, Google no longer supports Internet Explorer 8 or below, and IE9 does not support XP (for some reason, authentication must happen through IE, rather than the default browser). Why am I using XP? Because I can still get away with it, because it runs so much faster than Vista, 7, 8, or 10. I'd still be using Win98SE if I could run the software I want on it. Windows XP will also be abandoned by software developers, so moving on is inevitable, but I'm holding out.

So I started using some alternatives to Google Drive that work with the Google Drive API. First was Insync. Using Insync I was averaging 5GB per day, uploading to Google Drive. I then tried Goodsync, and I have already uploaded 25GB in less than one day. If you don't have a lot of files, Insync would probably be great for the beginner.

Compared to Insync, Goodsync is not as intuitive. For one, you need to find the Browse button, to select an online service. I was looking like crazy through the menus! After that, the interface is a lot like FreeFileSync, or other file comparison tools, which may not be easy for a newbie. Unlike traditional file comparison tools, ones like Goodsync are capable using FUSE (Filesystems in Userpace) with online storage services.

I have yet to try other sync implementations from online services (like Koofr), which sync to online storage services. Owncloud could be awesome, but it a royal PITA to set up, and the sync options are very poor (I want to be able to exclude directories, and I can't use symbolic links in a folder to achieve this, because they decided to not support symlinks after 1.5x). Owncloud has a very confusing and dangerous interface: https://github.com/owncloud/client/issues/2404. Never mind that it says the issue is resolved: it isn't. They just changed to wording to make it more clear. They don't understand that people are looking to exclude certain folders from backing up to the server, not delete them off their drive. The intention of Owncloud is to keep all files on the server, and only choose certain ones to have locally, but that isn't always what you want. I have a long standing way of organizing my files, and I don't plan to reorganize them for Owncloud.

Discussion

Enter your comment:
W S R C K
 
information-technology/2019-file-sync-and-backup.txt · Last modified: 2023/12/21 04:33 by 127.0.0.1