NAME

lumia - Manage checksums on a filesystem

SYNOPSIS

lumia command [-hqfv] arguments

DESCRIPTION

lumia is meant for managing checksums of files in order to prevent bitrot. It does this by storing several special files in each directory to keep track of the checksums:

.lumidify_archive_cksums

Contains the checksums of all files in the directory.

.lumidify_archive_dirs

Contains a list of all directories in the directory.

.lumidify_archive_cksums.cksum

Contains the checksums of .lumidify_archive_cksums and .lumidify_archive_dirs just because I'm paranoid.

.lumidify_archive_ignore

Contains a list of files and directories that should be ignored by lumia. Note that this is only read and never written to, unless the command clean is used. It is, however, still copied over by the extract command.

When the documentation for the commands talks about the "checksum database", it simply means these files.

All file/directory names are enclosed in quotes, with any backslashes or quotes inside the name escaped with another backslash. The names are allowed to have newlines in them.

The list files only contain a list of filenames, with a newline between the closing quote of one name and the opening quote of the next one.

The checksum files additionally contain the output of the checksum program used and a space before the starting quote of the filename.

OPTIONS

-h, --help

Show the full documentation.

-q, --quiet

Only output errors.

-f, --force

Overwrite files without prompting for confirmation.

-v, --verbose

Print each file that is processed by the command.

See the full documentation for details on which commands support which options and what they do.

It does not matter if the options are written before or after the command.

If -- is written anywhere on the command line, option parsing is stopped, so that files starting with a hyphen can still be specified.

Note that -q and -v aren't exactly opposites - -q applies to commands like check, where it suppresses printing of the individual files, while -v applies to commands like cp, where it is just passed on to the system command called in the background.

Note further that this is very inconsistent, like the rest of the program, but the author has made too many bad decisions to rectify that problem at the moment.

COMMANDS

Note that some commands support multiple files/directories as arguments and others, for which it would make just as much sense, don't. That's just the way it is.

addnew [-q] [directory]

Walks through directory, adding all new files to the checksum database. directory defaults to the current directory.

-q suppresses the printing of each file or directory as it is added.

checknew [directory]

Walks through directory, printing all files that aren't part of the checksum database. directory defaults to the current directory.

checkold [directory]

Prints all files in the checksum database that do not exist on the filesystem anymore. directory defaults to the current directory.

rmold [-q] [directory]

Removes all files found by checkold from the database. directory defaults to the current directory.

-q suppresses the printing of each file as it is removed.

check [-q] file/directory ...

Verifies the checksums of all files given, recursing through any directories. If no files or directories are given, the current directory is used.

Note that the checksum database in the corresponding directory will be read again for every file given on the command line, even if 1000 files in the same directory are given. This problem does not occur when recursing through directories, so it is best to only give files directly when checking a few. This problem wouldn't be too difficult to fix, but, frankly, I'm too lazy, especially since I only added the feature to check files individually as a convenience when I want to quickly check a single file in a large directory.

To explain why it is this way: The directory recursion is done using an iterator, which has the directories pushed onto its queue in the beginning. The iterator only returns directories, which are then checked all in one go, but this means that files given on the command line need to be handled specially.

-q suppresses the printing of all good checksums but still allows a message to be printed when a checksum failed.

clean [-q] [directory]

Removes all lumia special files used to store the checksum database from directory recursively. directory defaults to the current directory.

Note that this recurses through the entire directory tree, not just the part that is actually linked together by the checksum database.

Warning: This just blindly removes all files with one of the special lumia names, even if they weren't actually created by lumia.

-q suppresses the printing of each file as it is deleted.

extract [-v] [source] destination

Recreates the entire directory structure from source in destination, but only copies the special files used to store the checksum database. source defaults to the current directory.

-v prints each file as it is copied.

Note that this overwrites files in the destination directory without confirmation.

mkdir directory ...

Creates the given directories, initializing them with empty checksum database files.

update file ...

Recalculates the checksums for the given files and replaces them in the database.

Note: Directories given as arguments are ignored.

This is mainly meant to quickly "touch" a file after it was modified (e.g. a notes file that is occasionally updated).

updatespecial [directory]

Recalculates the checksums for the special files .lumidify_archive_dirs and .lumidify_archive_cksums and writes them to .lumidify_archive_cksums.cksum. directory defaults to the current directory.

This is only meant to be used if, for some reason, the checksum files had to be edited manually and thus don't match the checksums in .lumidify_archive_cksums.cksum anymore.

rm [-f] file ...

Removes the given files and directories recursively from the filesystem and checksum database. The following caveats apply:

If any actual errors occur while deleting the file/directory (i.e. the system command rm returns a non-zero exit value), the checksum or directory is left in the database. If the system rm does not return a non-zero exit value, but the file/directory still exists afterwards (e.g. there was a permission error and the user answered "n" when prompted), a warning message is printed, but the files are removed from the database (if the database can be written to).

It is an error if there are no checksum database files in the directory of a file named on the command line.

-f is passed through to the system rm command.

cp [-vf] source target
cp [-vf] source ... directory

Copies the given source files, updating the checksum database in the process.

If the last argument is a file, there must be only one source argument, also a file, which is then copied to the target.

If the last argument is a directory, all source arguments are copied into it.

It is an error if a source or destination directory does not contain any checksum database files.

cp will issue a warning and skip to the next argument if it is asked to merge a directory with an already existing directory. For instance, attempting to run cp dir1 dir2, where dir2 already contains a directory named dir1, will result in an error. This may change in the future, when the program is modified to recursively copy the files manually, instead of simply calling the system cp on each of the arguments. If this was supported in the current version, none of the checksums inside that directory would be updated, so it wouldn't be very useful.

-v is passed through to the system cp command.

-f silently overwrites files without prompting the user, much like the -f option in the system cp command. This is handled manually by the program, though, in order to actually determine what the user chose. See also the caveat mentioned above.

mv [-f] source target
mv [-f] source ... directory

Moves the given source files, updating the checksum database in the process.

If the last argument is a file or does not exist, there must be only one source argument, which is renamed to the target name.

If the last argument is an existing directory, all source arguments are moved into it.

It is an error if a source or destination directory does not contain any checksum database files.

mv behaves the same as rm with regards to checking if the source file is still present after the operation and other error handling.

-f is handled in the same manner as with cp.

MOTIVATION

There are already several programs that can be used to check for bitrot, as listed in "SEE ALSO". However, all programs I tried either were much too complicated for my taste or just did everything behind my back. I wanted a simple tool that did exactly what I told it to and also allowed me to keep the old checksums when reorganizing files, in order to avoid regenerating the checksums from corrupt files. Since I couldn't find those features in any program I tried, I wrote my own.

DESIGN DECISIONS

It may strike some readers as a peculiar idea to save the checksum files in every single directory, but this choice was made after much deliberation. The other option I could think of was to have one big database, but that would have made all commands much more difficult to implement and additionally necessitated opening the entire database for every operation. With individual files in each directory, operations like cp become quite trivial (ignoring all the edge cases) since only the toplevel checksums need to be copied to the new destination, and any subdirectories already contain the checksums.

This method is not without its drawbacks, however. The most glaring problem I have found is that there is no way to store the checksums of read-only directories or any special directories that cannot be littered with the checksum files because that would clash with other software. Despite these drawbacks, however, I decided to stick with it because it works for almost all cases and doesn't have any of the serious drawbacks that other options would have had.

The names of the special files were chosen to be ".lumidify_archive*" not out of vanity, but mainly because I couldn't think of any regular files with those names, making them a good choice to avoid clashes.

The name of the program, lumia (for "lumidify archive"), was similarly chosen because it did not clash with any programs installed on my system and thus allowed for easy tab-completion.

HASH ALGORITHMS

By default, the simple cksum algorithm is used to get the checksums. This is not very secure, but the main purpose of the program is to prevent bitrot, for which cksum should be sufficient, especially since it is much faster than other algorithms.

There is currently no convenient way to change the algorithm other than changing the $CKSUM_CMD and $CKSUM_NUMFIELDS variables at the top of lumia. $CKSUM_CMD must be the command that returns the checksum when it is given a file, and $CKSUM_NUMFIELDS specifies the number of space-separated fields the checksum consists of. This has to be specified in order to determine where the checksum ends and the filename begins in the output. This would be redundant if all implementations of cksum supported '-q' for outputting only the checksum, but that only seems to be supported by some implementations.

USAGE SCENARIOS

Security auditing

This program is NOT designed to provide any security auditing, as should be clear from the fact that the checksums are stored right in the same directory as the files. See mtree(8) for that.

If you want to, however, you could set $CKSUM_CMD to a secure hash (not cksum) and extract the checksums to a separate directory, which you keep in a safe place. You could then use the regular cp command to simply replace all the checksums with the ones from your backup, in case an attacker modified the checksum database in the directory with the actual files you're trying to protect. I don't know if there would be any point in doing that, though.

Managing archives

This is the purpose I wrote the program for.

You can simply initialize your archive directory with the addnew command. Whenever you add new files, just run addnew again. If you want to reorganize the archive, you can use the limited commands available.

I usually just use rsync(1) to copy the entire archive directory over to other backup drives and then use the check command again on the new drive.

I also have checksums for the main data directory on my computer (except for things like git repositories, which I don't want littered with the database files). Here, I use the update command for files that I edit more often and occasionally run check on the entire directory.

Since the database files are written in each directory, you can run the addnew command in any subdirectory when you've added new files there.

PERFORMANCE

Due to the extensive use of iterators and the author's bad life choices, some functions, such as addnew and check, run more slowly than they would if they were programmed more efficiently, especially on many small files and folders. Too bad.

PORTABILITY

This program was written on OpenBSD. It will probably work on most other reasonably POSIX-Compliant systems, although I cannot guarantee anything. $CKSUM_CMD may need to be modified at the top of lumia. The file operation commands are called directly with system(), so those need to be available.

It will most certainly not work on Windows, but that shouldn't be a problem for anyone important.

BUGS

All system commands (unless I forgot some) are called with "--" before listing the actual files, so files beginning with hyphens should be supported. I have tested the commands with filenames starting with spaces and hyphens and also containing newlines, but there may very well be issues still. Please notify me if you find any filenames that do not work. Handling filenames properly is difficult.

There are probably many other edge cases, especially in the mv, cp, and rm commands. Please notify me if you find an issue.

Operations on files containing newlines may cause Perl to print a warning "Unsuccessful stat on filename containing newline" even though nothing is wrong since (as described in mv and rm) existence of the file is checked afterwards. I didn't feel like disabling warnings, and no normal person should be working with files containing newlines anyways, so that's the way it is.

EXIT STATUS

Always 0, unless the arguments given were invalid. We don't do errors around here.

On a more serious note - I should probably change that at some point. For the time being, if you want to run check in a script, you can test the output printed when the -q option is used, since this won't output anything if there are no errors. Do note, though, that actual errors (file not found, etc.) are printed to STDERR, while incorrect checksums are printed to STDOUT.

SEE ALSO

par2(1), mtree(8), aide(1), bitrot(no man page)

LICENSE

Copyright (c) 2019, 2020, 2021 lumidify <nobody[at]lumidify.org>

Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.