User Tools

Site Tools


information-technology:2019-file-sync-and-backup

Nextcloud

I've been using Nextcloud, installed on my web hosting server, for the sole purpose of syncing passwords between my android phone and Pale Moon browser on Windows. Nextcloud aims to be collaboration software, rather than file sync, but I've been using it as such. I wish I had recorded the steps I'd followed to set it up, but it wasn't difficult.

Today I got an error message, saying that a file on the server was locked. I found this page, which gave me a workaround. In the case of using shared hosting, where I can't install a more sophisticated database sharing solution called Redis, I should just disable file locking.

“…edit your configuration file config/config.php:
'filelocking.enabled' ⇒ false,

Now I have to backup my database in case of corruption? It's unlikely my phone will try to write to the database at the same time as my desktop, considering it's just me.

I really like Dokuwiki's non-database solution. If the file has changed, I get a message telling me to keep the change or overwrite with my current edit. In Windows, I am using a file unlocker all the time, without ill effect. Although sometimes I am unable to unlock a file. I wish I could disable file locking in Windows. On few occasions I can't unlock a Windows file, and I have to wait until next reboot in order to delete it. I don't know the consequences if I was able to delete it against the system's wishes.

While I use Nextcloud on my web hosting server, I don't use it for file backup, because using a web host for large amounts of file storage is an abuse.


File Sync and Backup

I'd rather pay for online storage. I could have NAS Network Attached Storage or a server, but I don't want to maintain additional network hardware. I want to own less things.

File Management Strategy

You can have all your files on a server, and work with them remotely from all your devices. The server takes care of backup with file versioning.
Pros:
- Easy maintenance
Cons:
- Slower file access
- No access offline

You can sync your files to all your devices, including a copy on the file server, which provides backup with file versioning.
Pros:
- Fast access to files
Cons:
- Overhead in terms of system resources (especially CPU usage for encrypted file transfers)
- Higher file system maintenance on devices

The best way is to have all the files on the server, and sync only sets of files you use frequently. In addition, being able to mount the server file system as a network drive, so you don't have to download the whole bunch if you occasionally need to work with some. This is my current ideal.

I've had some not-great experiences before, but I'm going to try again as maybe the software I'll be using and my knowledge have improved since then. I'm going to try using a locally running software like WebDrive, NetDrive, SftpNetDrive, CyberDuck. If anything I feel stuck because of too many choices.

I left a question over at serverfault.com, and had to provide my own sort-of-an-answer.

Everything below this has no positive solutions, and but can be useful as reference.


Servers and Object Storage

Motivation for a Virtual Private Server

I want to configure my own server, so I can have granular control over the settings. I want to tone down the encryption/compression so that communications with the server is easier on my laptop's system resources. Chances are you have a speedy machine and don't care. You would be happy with the defaults of some other commercial service for that purpose (like BackBlaze, SpiderOak, or Carbonite). However, if I figure out how to get this done, and give you easy instructions on how to set up your own server, then you will have a choice of using the commercial option. It might be better than consumer options? I don't know enough to answer to this question.


Cloud Storage File System

Online Storage Services use Object Storage, as opposed to file systems which manage data as a file hierarchy. With Amazon S3 and Google Cloud Storage, there is no transparency: you have to deal with the differences. Even with Google drive, when I deleted a few gigabytes of data all at once, I saw files lingering around that were supposedly deleted. Kind of makes you question your sanity.

Take a look at what Webdrive says about Amazon S3:

” - S3 does not have a concept of folders, but WebDrive uses the S3 prefix/delimiter option to make it appear as folders with '/' being the delimiter. This allows you to create folders with WebDrive and put files into them.
- S3 does not support a rename operation. So to rename a file WebDrive downloads the file then uploads it to the new name and deletes the original. This means you can not rename a folder that has files in it.

Remember buckets are global in S3, so after mapping a drive to the S3 service. You should use the DOS prompt to create the folder and not use explorer if created a bucket/folder at the root level since it's global. Something like “New folder” will fail because somebody else in the world has already created one. So open a DOS prompt and enter something like “MKDIR my-name-that-is-sure-to-be-unique” perhaps having your name, or some numbers in them. Something like “Docs” will fail. “


Preserving Modified Date for a File using Buckets

Netdrive: https://support.bdrive.com/t/the-created-modified-date-of-files-seem-to-spontaneously-change/53156
Unfortunately, the protocol sftp doesn’t support SetFileTime for a file in remote storage.
It’s not supported by sftp as well as webdav.
Some ftp server suuports it.
GoogleDrive or box support it.

Cloudmounter: https://www.eltima.com/forum/index.php?topic=128373.0
I've received an update from our developer regarding this matter. First of all, let me clarify that not all servers are able to give the date parameter.
With AWS however, the creation date is not transmitted on any server. This is a server side configuration and, unfortunately, we cannot affect this from our side.

S3: https://forums.aws.amazon.com/thread.jspa?messageID=738308 Currently S3 does not support customizing the Last Modified value, as per the following documentation
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
However, one way you can achieve this would be to use the AWS CLI to store this information when using the 'aws s3api put-object' command to upload files and appending the following argument:
–metadata (map) A map of metadata to store with the object in S3.

Cyberduck: https://trac.cyberduck.io/ticket/6441
Preserve modification date does not work with Amazon S3 to Preserve modification date
Type changed from defect to enhancement
Setting timestamps is not currently supported in S3 because the API has no such feature per se. We could add our own proprietary metadata to the uploaded files to store the modification date.
Resolution set to wontfix
Status changed from new to closed

Using metadata to store modified date in google cloud storage2
Command line tool for linux that works2


File Sharing Protocols

WinSCP provides a good comparison of supported file system features using the communication protocols WebDAV, SFTP, and FTP. The way I understand it, the focus of WebDAV is for user collaboration, while SFTP focuses on providing a remote file system.

According to WinSCP, SFTP is slow because of the SSH encryption, and because of the need for packet confirmations. If I am setting up my own server, I could possibly configure SFTP to run with a lower encryption level and bypass confirmation packets? There is some speculation that it is the WinSCP implementation of SFTP that is the problem. Not finding consistency on the web leaves me unsure about which protocol to choose. This topic has been a rabbit hole for me, for years.


Amazon Throughput Rate

” …sc1 uses a burst-bucket model for performance. Volume size determines the baseline throughput of your volume, which is the rate at which the volume accumulates throughput credits. Volume size also determines the burst throughput of your volume, which is the rate at which you can spend credits when they are available. Larger volumes have higher baseline and burst throughput. The more credits your volume has, the longer it can drive I/O at the burst level.

For a 1-TiB sc1 volume, burst throughput is limited to 80 MiB/s, the bucket fills with credits at 12 MiB/s, and it can hold up to 1 TiB-worth of credits. “ source

I conclude that for my meager 150GB, my base throughput rate will be ~1.6 MiB/s, and I'll have a burst rate of ~11 MiB/s. The burst rate will last for a maximum of 150GB. It will accumulate towards the burst rate at 1.6 MiB/s. I don't think I will ever drop from burst rate!


VPS via Command Line

Do I need to install a linux sftp server2 to make use of its file system? Much as I want to avoid having to deal with maintaining a server, the downside is having to be at the mercy of proprietary solutions (trials and tribulations below). However, taking a quick look around, a VPS with the 150GB of storage I want is expensive, starting at about $30/mth.

Google Cloud Storage, and probably Amazon S3 as well, allows for a free weakling-level VPS. The one at Google gets a whole one fifth of a CPU core.

The server would be using FUSE to simulate a hierarchical filesystem on the mounted buckets (similar to what WebDrive described above). The following is a technical overview from google:

“Cloud Storage FUSE works by translating object storage names into a file and directory system, interpreting the “/” character in object names as a directory separator so that objects with the same common prefix are treated as files in the same directory. Applications can interact with the mounted bucket like a simple file system, providing virtually limitless file storage running in the cloud. … While Cloud Storage FUSE has a file system interface, it is not like an NFS or SMB file system on the backend.” source

What is better? To have a local FUSE software interact with the buckets on the server, or to have FUSE software running on the server, that interacts with its own local buckets? Based on what I've read recently, running commands over http is not ideal because http wasn't designed for this purpose, and I think I understand that both SFTP and WebDAV use the http protocol to carry out their tasks?

Let me think of an example: if I were to rename a folder with 1000 files, then this would mean 1000 file rename instructions going to the server. Commands traveling over long distances. My experience with the internet, is that sometimes packets get dropped. I think FUSE happening within the IAAS would be advantageous.

The VPS interacting with the buckets is no different than my own machine doing so, except that:

  • the physical distance from the VPS to the bucket storage may be less and require less hops
  • the FUSE implementation may be superior on either the locally running software, or the VPS server




VPS with Web Based GUI

Duplicati

Syncthing

I would rather use something manual at first, like freefilesync or vice-versa, which visually shows what will happen before it occurs. I'm afraid sync can sometimes do something you don't want, often due to user error (making assumptions about what it will do, rather than what it actually does).

However, you can't connect to freefilesync or vice-versa interfaces over the internet, as they are local GUI. Syncthing and Duplicati run in a web browser. This would be a non-issue mounting the bucket storage as a local drive. However, in the past I've had issues using vice-versa with WebDrive and Google Drive in the past. Integrated solutions are less likely to have bugs.

With a sync program that doesn't graphically show you what it will do before it executes, best practice would be to first make an offline backup, and second to use a small subset of files, so you can go over results quickly.

A bare-bones virtual machine on either Amazon or Google would be free, and I would be paying for Amazon's Elastic Block Storage (EBS) or Google's GCS Standard Persistent Disk. The EBS or Persistent Disk would be mounted on the virtual machine (VM) instance. Comparison of terminology

I would then run the free and open source Syncthing on my computer, and also on the server, since versions are available for both Windows and Linux. The forum notes are inconclusive. This video on youtube shows the ease of using the Syncthing interface.

Syncthing will not mount the remote file system as a network drive, but rather a folder (much like dropbox). I can use symbolic links and junctions. I can still run an sftp server separately (on the same VM), to mount the network drive and view the data that is not synced.

Syncthing uses Block Exchange Protocol. So not SFTP, nor WebDAV, nor SMB, as I was considering before.

Amazon: $0.025 per GB-month of provisioned storage (sc1) Google: Standard provisioned space $0.040 Price (per GB / month)

For 150GB, Amazon is $3.75/mth. Compare that, not with GCS, but Google Drive, which offers 200GB for $3/mth. However, I want control of the server.


2018

In the past I was somewhat confused about the big picture because I was wrestling with the details, much the way I still am. Following these links is preferable to reading the rest of this article:
https://www.backblaze.com/blog/sync-vs-backup-vs-storage
https://www.cloudwards.net/how-to-set-up-a-cloud-network-drive

Doing your own file synchronization, like with FreeFileSync, or ViceVersa, takes a lot of work, especially if you are always organizing your files, and doing so on more than one computer. It's better if you can use automatic sync, or work off of a network drive. While doing a manual sync, for example, looking at differences, you won't be sure if you want to delete certain files: you're not sure if you moved them to another location while organizing. If you don't delete, you will end up with duplicates. I'm thinking it may be better to leave merging to an online backup/sync service. As a bonus, many of these services provide versioning.

If you have a certain amount of files that goes over the quota of the service, you need to create an archive of old, stagnant files. Store the old, stagnant files outside of sync. You can make your own external backup of these files. Even if you are within quota, having less files in the sync will make for less overhead resources. From my experiences in 2016, it's best to have the least amount of files possible for online backup services. It doesn't matter if Mega gives you 50GB, or Google Drive 100GB for pocket change. Don't use that much!

In general, it's better to have a lesser amount of files. I used to think that in the virtual world of computers, you could have practically unlimited amount of stuff (unlike the real world where I want to have as little as possible). Well, keeping less stuff also applies in the virtual world. Keeping files requires maintenance.

I'm currently using Mega (megasync). Things I like about Mega:
- It can be made portable (doesn't need to be installed)
- It's outside of the United States, far away from local government (law enforcement, NSA, etc, are like hammers, and to hammers everything is a nail)
I think all the services are rather comparable: I can't say one is that much better than the others. Mega is lightweight on the system as long as your amount of files is low. I believe the larger part of the reason Mega can consume so much system resources (as with any of the services), is that file encryption is CPU intensive. I wish there was a way to disable, or reduce the file encryption. I'm just not big fish for any fisherman out there.


2016

I have figured out how to classify the services available, which makes it much easier to know what I want. I have labelled them as follows.

  • services: online storage services, that have sync capability via an api (application program interface), and/or their own sync program
  • clientside: sync programs running on your computer that works with aforementioned services via api.
  • serverside: an online service with unified sync to multiple storage services, where the server handles aforementioned services via api.

Of the three, I think serverside could be most favorable, because there is less work for your computer to do. Your computer only has to interact with one service, and the service (aka server) takes care of all the other services (such as Google Drive, Dropbox, etc). So why use multiple services, instead of just one? To compare them, of course! They have different features. Also because if one online service closes down, it's easy move files to another. Google definitely drops services at whim.

  • Examples of services are Google Drive, Dropbox, Mega, etc.
  • Examples of clientside are Insync, Goodsync, Webdrive, etc.
  • Examples of serverside are Koofr, Odrive, Storage Made Easy, Cloud Combine, and Air Explorer.

Most people will be fine with any free account, unless they have a lot of files. I have around 160GB. It is not just a matter of finding that much space. With all the encryption and compression and file comparison that these sync programs do, your system resources take a hit. Dropbox states that their application will slow down your computer at around 300,000 files. I have ~200,000, and with my old computer, it becomes unresponsive. Most of my space is filled with software, especially with all the remastering of Windows that I have done.

I can't help but recall, when I was using Carbonite between 2007-2012, that it could handle backing up all my files, with an acceptable amount of a toll on system resources. This occurs after Carbonite has caught up with uploading all the files currently on your computer. It also occurs with Dropbox. The trick is to leave your sync or backup program running when you aren't at your computer. Once the sync has caught up, then you can leave the sync program running all the time.

In all these services and applications, I wish there was an opt-out option, for encryption/compression, in order to use less CPU and memory. Only a select few of my files would need any security at all.

I have experienced one exception to the high resource usage of a sync program confronted with a mountain of new files: Mega. It was averaging only 10% of my CPU, and used much less RAM than Dropbox. Thank you, Mega.

The following sections should be read in reverse order, to be chronologically correct.


20160420 Dropbox, Mega, Webdrive

Dropbox brings my computer to a halt: it consumes CPU and memory like crazy (not only in its own process, but explorer.exe as well!). I'm hoping that once it catches up on uploading all my files, that it will stop being a pain. I have even moved all my files, into their folders, rather than using symlinks. They may say it is 1TB (I'm using business trial account for a month), but that's a joke if the desktop app can't handle too many files.

I had been waiting to hear from anyone at Dropbox, after many requests for help. The result was a useless canned response, and a few days later, they announced to drop support for Windows XP. I watched in FileMon, as it kept wastefully looking up a Desktop.ini file at the root of the dropbox folder, more so than actually doing anything useful, like syncing my files. Almost seems like the high resource usage is intentional? I proceeded by denying NTFS ACL access to that file.

To make it more useable, I split up my files to less than 50GB per account. So I currently have one business and one personal, that are tied together and I can use via the desktop dropbox app. Then another business dropbox account, for which I send files through Webdrive, and then, I have a Mega account.

Mega has behaved stupendously well. It just works.

I have been trying to deal with Webdrive's inability to provide the original time stamp for uploaded files. This is the case, in Dropbox, and in Google Drive. After communicating with Webdrive, I found out that they do support time stamps on Google Drive, but not on Dropbox. The old version of Vice Versa I was using, somehow was incompatible at maintaining time stamps with Webdrive + Google Drive. I am still a little mistified about this, considering Webdrive is the last one to touch the file on the way to Google Drive.


20160412 Dropbox, Boxifier

I am experimenting with Dropbox. I hope that I can just set it and forget it. Except I am learning that these programs in general, are limited in the number of files they can handle. Symbolic links seem to be working, at least for me, such that I don't see what the hype is about the addon for Dropbox called Boxifier, except for use with external drives and network shares. Dropbox recommends moving your files into the dropbox folder, and creating symlinks to replace them, but I am doing the opposite, because I may use other sync programs/services as well. Meanwhile, I also have a backup on an external drive, to recover from stupidities from misbehaving sync apps, and my own errors.


20160409 Webdrive, Insync, Goodsync, Google Drive, Dropbox

Webdrive support told me which registry value (AuthCode) to put the secret code from google. So now with Webdrive, I can mount google drive as a local hard drive! It is incredibly slow, but not as slow as Insync. It is the nature of the beast, that it is not an actual local hard drive. I think Goodsync is faster because it is multi-threaded. I think I can accomplish the same in Webdrive, by manually having multiple transfers going at the same time. In the settings for a Webdrive connection, you can specify how many simultaneous connections are allowed. At the moment, I am using my own favorite file comparison tool, Vice-Versa along with Webdrive. It is nice to have some familiarity when working with a new environment: I can check to make sure the backup on Google drive matches my local files, when using Goodsync or other sync programs.

Goodsync has been giving me errors. I believe this has to do with the poor quality of Google Drive's programming. I can see how third party apps that interface with Google Drive, would also have a hard time. Once upon a time, google could do no wrong. I suppose that was a time when nerds were in charge. In a large organization, it tends to happen that people who are good at manipulating others tend to climb the leadership ladder, rather than the nerdy ones that focus on technology.

Newer versions of Goodsync may have fixed the errors I was having (it said so, in the release notes), however, newer versions also cause my system to BSOD. In fact, I had to access the system offline, in order to keep Goodsync from running, in order to get back to business. Goodsync said I probably have bad drivers, though I'm not sure which drivers they may be referring to.

Webdrive has been pretty reliable with Google Drive, but it can't maintain the modified date time stamp of the original files. Files on Google drive, have only the date and time they were uploaded. Update: it was the use of the old version of Vice-Versa that didn't maintain the file-stamp. Goodsync maintains the modified date stamp on the file.

With “automatic” file sync programs like Insync, I don't know what kind of sync they are using. I liked that Goodsync was more like a file comparison program, but when there were errors, there wasn't much I could do. Having Google Drive mounted as a hard drive with Webdrive, I can use my favorite tools to investigate and fix the errors. At the moment, I'm using an old free version of Vice Versa, and SuperCopier (The better versions developed by SFXTeam, for XP and Vista/7, see French Wikipedia article for reference).

Thanks to adding SHA-2 to my operating system, I also got to try out Google Drive. Google Drive desktop client was horrible. I can't begin to tell you how unreliable it was in my sync testing. Also, I was expecting it to show up as an actual drive. It is just a sync folder like Dropbox. The difference is, that Dropbox works, and also supports symbolic links.


20160405 Koofr, Webdrive

I tried Koofr. At first it didn't run on XP SP2, because it lacks SHA-2. So I dropped SP3 files crypt32.dll, and rsaenh.dll into the system32 folder, and that did the trick. I found out, that I could only sync a local folder on my computer with Koofr, not with Google Drive (by design). So there went the whole idea of having Koofr be the intermediary between a bunch of storage services.

Although, Koofr has the feature that it allows you to sync between cloud services.

The XP SP3 files crypt32.dll, and rsaenh.dll, almost got Webdrive to work with Google Drive. I had to copy the URL from the ieframe to “internet explorer”, which gave me the key to paste into the Webdrive program. Unfortunately, there was nowhere to paste the key. I see that the connection info is stored in the Windows registry under: HKEY_CURRENT_USER\Software\South River Technologies\WebDrive\Connections\google drive\. I send a support request to Webdrive, and hope for the best. I don't really want to apply SP3, as I am averse to software bloat.


20160324 Google Drive, Insync, Goodsync, Owncloud

I could have just installed Google Drive, and used a file sync program. But I didn't. Why? Because I'm still using Windows XP, and Google no longer supports it. Specifically, Google no longer supports Internet Explorer 8 or below, and IE9 does not support XP (for some reason, authentication must happen through IE, rather than the default browser). Why am I using XP? Because I can still get away with it, because it runs so much faster than Vista, 7, 8, or 10. I'd still be using Win98SE, but I want to use modern software. That will soon be the case for XP as well, maybe in a year or two.

So I started using some alternatives that work with the Google Drive API. First was Insync. Using Insync I was averaging 5GB per day, uploading to Google Drive. I then tried Goodsync, and I have already uploaded 25GB in less than one day. If you don't have a lot of files, Insync would probably be great for the beginner.

Compared to Insync, Goodsync is not as intuitive. For one, you need to find the Browse button, to select an online service, perhaps because they are trying to push their own Goodsync connect? After that, the interface is a lot like FreeFileSync, or other file comparison tools, which may not be easy for a newbie. Unlike traditional file comparison tools, ones like Goodsync are capable using FUSE (Filesystems in Userpace) with online storage services.

I have yet to try other sync implementations from online services (like Koofr), which sync to online storage services. Owncloud could be awesome, but it a royal PITA to set up, and the sync options are very poor (I want to be able to exclude directories, and I can't use symbolic links in a folder to achieve this, because they decided to not support symlinks after 1.5x). Owncloud has a very confusing and dangerous interface: https://github.com/owncloud/client/issues/2404. Never mind that it says the issue is resolved: it isn't. They just changed to wording to make it more clear. They don't understand that people are looking to exclude certain folders from backing up to the server, not delete them off their drive. The intention of Owncloud is to keep all files on the server, and only choose certain ones to have locally, but that isn't always what you want. I have a long standing way of organizing my files, and I don't plan to reorganize them for Owncloud.

information-technology/2019-file-sync-and-backup.txt · Last modified: 2019/06/24 22:40 by marcos