Merge branch 'develop' of https://github.com/SlavikMIPT/tgcloud into develop

2025-03-09 15:40:14 +00:00 · 2019-06-16 18:39:03 +03:00 · 2019-06-16 18:39:03 +03:00 · 37c2d3c02e
commit 37c2d3c02e
parent 1d93c7db68 c35307b1d2
16 changed files with 2915 additions and 8 deletions
--- a/.gitmodules
+++ b/.gitmodules
@ -1,6 +0,0 @@
-[submodule "filebrowser"]
-	path = filebrowser
-	url = https://github.com/SlavikMIPT/filebrowser
-[submodule "dedupfs"]
-	path = dedupfs
-	url = https://github.com/xolox/dedupfs.git
--- a/README.md
+++ b/README.md
@ -1 +1,38 @@
-# tgloader
+# tgcloud
+## UNDER DEVELOPMENT
+## Opensourсe Virtual Filesystem for Telegram
+Synchronizes and structures files downloaded to Telegram.
+- Stores only metadata, accessing raw data only when loading files.
+- Loading speed is up to 240Mbit/s per session
+- Multiplatform: provides standard volumes which can be mounted on linux/win/mac...
+- Opensource
+### Project structure:
+**tgcloud:** linux based docker container
+* **redis** - updates, rpc, communication
+* **tfs:** FUSE based VFS module
+  * [python-fuse](https://github.com/SlavikMIPT/tfs) - interface to linux kernel FS
+  * redis storage - FS struct, meta, telegram file_id,settings
+  * rq communication interface
+  * docker
+* **file_telegram_rxtx** - telegram read/write driver
+  * [telethon(sync)](https://github.com/SlavikMIPT/Telethon) by [@Lonami](https://github.com/Lonami) - telegram access, multithreaded downloading/uploading
+    * improved and tested by [@SlavikMIPT](https://github.com/SlavikMIPT) - load speed 240Mb/s 
+  * rq communication interface
+  * docker
+* **polling daemon**
+  * [telethon(asyncio)](https://github.com/SlavikMIPT/Telethon) - updates from telegram, synchronization, hashtags
+  * rq communication interface
+  * docker
+* **client**
+  * telegram authorization interface
+  * [filebrowser](https://github.com/SlavikMIPT/filebrowser) - opensource golang filebrowser
+  * windows service
+  * telegram desktop client with filebrowser
+  * settings, statistics, monitoring...
+  * rq communication interface
+  * docker
+![Diagram](/img/ProjectDiagram.png)
+
+You are welcome to collaborate - contact 
+Telegram: [@SlavikMIPT](t.me/SlavikMIPT)
+Channel: [@MediaTube_stream](t.me/MediaTube_stream)
--- a/1
+++ b/1
@ -1 +0,0 @@
-Subproject commit 78f0814a6f5e43915e0512273d8b26e87b3ae353
--- a/dedupfs/LICENSE
+++ b/dedupfs/LICENSE
@ -0,0 +1,21 @@
+DedupFS is licensed under the MIT license.
+
+Copyright 2010 Peter Odding <peter@peterodding.com>.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/dedupfs/NOTES
+++ b/dedupfs/NOTES
@ -0,0 +1,12 @@
+The included fuse.py module includes a single-line bug fix to the fuse.py file
+included with Ubuntu's Python 2.6 package for the method Timespec.__init__():
+
+    480c480
+    <     def __init__(self, name, **kw):
+    ---
+    >     def __init__(self, **kw):
+
+During initial development I used the following resources:
+ - http://sf.net/apps/mediawiki/fuse/index.php?title=FUSE_Python_Reference
+ - http://linux.die.net/man/2/path_resolution
+ - /usr/include/fuse/fuse.h :-(
--- a/dedupfs/README.md
+++ b/dedupfs/README.md
@ -0,0 +1,42 @@
+# DedupFS: A deduplicating FUSE file system written in Python
+
+The Python program [dedupfs.py](http://github.com/xolox/dedupfs/blob/master/dedupfs.py) implements a file system in user space using [FUSE](http://en.wikipedia.org/wiki/Filesystem_in_Userspace). It's called DedupFS because the file system's primary feature is [data deduplication](http://en.wikipedia.org/wiki/Data_deduplication), which enables it to store virtually unlimited copies of files because unchanged data is only stored once. In addition to deduplication the file system also supports transparent compression using the compression methods [lzo](http://en.wikipedia.org/wiki/LZO), [zlib](http://en.wikipedia.org/wiki/zlib) and [bz2](http://en.wikipedia.org/wiki/bz2). These properties make the file system ideal for backups: I'm currently storing 250 GB worth of backups using only 8 GB of disk space.
+
+Several aspects of the design of DedupFS were inspired by [Venti](http://en.wikipedia.org/wiki/Venti) (ignoring the distributed aspect, for now…) and [ZFS](http://en.wikipedia.org/wiki/ZFS), though I've never personally used either. The [ArchiveFS](http://code.google.com/p/archivefs/) and [lessfs](http://www.lessfs.com/) projects share similar goals but have very different implementations.
+
+## Usage
+
+The following shell commands show how to install and use the DedupFS file system on [Ubuntu](http://www.ubuntu.com/) (where it was developed):
+
+    $ sudo apt-get install python-fuse
+    $ git clone git://github.com/xolox/dedupfs.git
+    $ mkdir mount_point
+    $ python dedupfs/dedupfs.py mount_point
+    # Now copy some files to mount_point/ and observe that the size of the two
+    # databases doesn't grow much when you copy duplicate files again :-)
+    # The two databases are by default stored in the following locations:
+    #  - ~/.dedupfs-metastore.sqlite3 contains the tree and meta data
+    #  - ~/.dedupfs-datastore.db contains the (compressed) data blocks
+
+## Status
+
+Development on DedupFS began as a proof of concept to find out how much disk space the author could free by employing deduplication to store his daily backups. Since then it's become more or less usable as a way to archive old backups, i.e. for secondary storage deduplication. It's not recommended to use the file system for primary storage though, simply because the file system is too slow. I also wouldn't recommend depending on DedupFS just yet, at least until a proper set of automated tests has been written and successfully run to prove the correctness of the code (the tests are being worked on).
+
+The file system initially stored everything in a single [SQLite](http://www.sqlite.org/) database, but it turned out that after the database grew beyond 8 GB the write speed would drop from 8-12 MB/s to 2-3 MB/s. Therefor the file system now stores its data blocks in a separate database, which is a persistent key/value store managed by a [dbm](http://en.wikipedia.org/wiki/dbm) implementation like [gdbm](http://www.gnu.org/software/gdbm/gdbm.html) or [Berkeley DB](http://en.wikipedia.org/wiki/Berkeley_DB).
+
+### Limitations
+
+In the current implementation a file's content needs to fit in a [cStringIO](http://docs.python.org/library/stringio.html#module-cStringIO) instance, which limits the maximum file size to your free RAM. Initially I implemented it this way because I was focusing on backups of web/mail servers, which don't contain files larger than 250 MB. Then I started copying virtual disk images and my file system blew up :-(. I know how to fix this but haven't implemented the change yet.
+
+## Dependencies
+
+DedupFS was developed using Python 2.6, though it might also work on earlier versions. It definitely doesn't work with Python 3 yet though. It requires the [Python FUSE binding](http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FUSE_Python_tutorial) in addition to several Python standard libraries like [anydbm](http://docs.python.org/library/anydbm.html), [sqlite3](http://docs.python.org/library/sqlite3.html), [hashlib](http://docs.python.org/library/hashlib.html) and [cStringIO](http://docs.python.org/library/stringio.html#module-cStringIO).
+
+## Contact
+
+If you have questions, bug reports, suggestions, etc. the author can be contacted at <peter@peterodding.com>. The latest version of DedupFS is available at <http://peterodding.com/code/dedupfs/> and <http://github.com/xolox/dedupfs>.
+
+## License
+
+This software is licensed under the MIT license.  
+© 2010 Peter Odding &lt;<peter@peterodding.com>&gt;.
--- a/dedupfs/TODO
+++ b/dedupfs/TODO
@ -0,0 +1,36 @@
+Here are some things on my to-do list, in no particular order:
+
+ * Automatically switch to a larger block size to reduce the overhead for files
+   that rarely change after being created (like >= 100MB video files :-)
+
+ * Implement the fsync(datasync) API method?
+   if datasync:
+    only flush user data (file contents)
+   else:
+    flush user & meta data (file contents & attributes)
+
+ * Implement rename() independently of link()/unlink() to improve performance?
+
+ * Implement `--verify-reads` option that recalculates hashes when reading to
+   check for data block corruption?
+
+ * `report_disk_usage()` has become way too expensive for regular status
+   reports because it takes more than a minute on a 7.0 GB database. The only
+   way it might work was if the statistics are only retrieved from the database
+   once and from then on kept up to date inside Python, but that seems like an
+   awful lot of work. For now I've removed the call to `report_disk_usage()`
+   from `print_stats()` and added a `--print-stats` command-line option that
+   reports the disk usage and then exits.
+
+ * Tag databases with a version number and implement automatic upgrades because
+   I've grown tired of upgrading my database by hand :-)
+
+ * Change the project name because `DedupFS` is already used by at least two
+   other projects? One is a distributed file system which shouldn't cause too
+   much confusion, but the other is a deduplicating file system as well :-\
+
+ * Support directory hard links without upsetting FUSE and add a command-line
+   option that instructs `dedupfs.py` to search for identical subdirectories
+   and replace them with directory hard links.
+
+ * Support files that don't fit in RAM (virtual machine disk images…)
--- a/dedupfs/init.py
+++ b/dedupfs/init.py
--- a/dedupfs/dedupfs.py
+++ b/dedupfs/dedupfs.py
--- a/dedupfs/fuse.py
+++ b/dedupfs/fuse.py
@ -0,0 +1,982 @@
+#
+#    Copyright (C) 2001  Jeff Epler  <jepler@unpythonic.dhs.org>
+#    Copyright (C) 2006  Csaba Henk  <csaba.henk@creo.hu>
+#
+#    This program can be distributed under the terms of the GNU LGPL.
+#    See the file COPYING.
+#
+
+
+# suppress version mismatch warnings
+try:
+    import warnings
+
+    warnings.filterwarnings('ignore',
+                            'Python C API version mismatch',
+                            RuntimeWarning,
+                            )
+except:
+    pass
+
+from string import join
+import sys
+from errno import *
+from os import environ
+import re
+from fuseparts import __version__
+from fuseparts._fuse import main, FuseGetContext, FuseInvalidate
+from fuseparts._fuse import FuseError, FuseAPIVersion
+from fuseparts.subbedopts import SubOptsHive, SubbedOptFormatter
+from fuseparts.subbedopts import SubbedOptIndentedFormatter, SubbedOptParse
+from fuseparts.subbedopts import SUPPRESS_HELP, OptParseError
+from fuseparts.setcompatwrap import set
+
+##########
+###
+###  API specification API.
+###
+##########
+
+# The actual API version of this module
+FUSE_PYTHON_API_VERSION = (0, 2)
+
+
+def __getenv__(var, pattern='.', trans=lambda x: x):
+    """
+    Fetch enviroment variable and optionally transform it. Return `None` if
+    variable is unset. Bail out if value of variable doesn't match (optional)
+    regex pattern.
+    """
+
+    if var not in environ:
+        return None
+    val = environ[var]
+    rpat = pattern
+    if not isinstance(rpat, type(re.compile(''))):
+        rpat = re.compile(rpat)
+    if not rpat.search(val):
+        raise RuntimeError("env var %s doesn't match required pattern %s" % \
+                           (var, `pattern`))
+    return trans(val)
+
+
+def get_fuse_python_api():
+    if fuse_python_api:
+        return fuse_python_api
+    elif compat_0_1:
+        # deprecated way of API specification
+        return (0, 1)
+
+
+def get_compat_0_1():
+    return get_fuse_python_api() == (0, 1)
+
+
+# API version to be used
+fuse_python_api = __getenv__('FUSE_PYTHON_API', '^[\d.]+$',
+                             lambda x: tuple([int(i) for i in x.split('.')]))
+
+# deprecated way of API specification
+compat_0_1 = __getenv__('FUSE_PYTHON_COMPAT', '^(0.1|ALL)$', lambda x: True)
+
+fuse_python_api = get_fuse_python_api()
+
+
+##########
+###
+###  Parsing for FUSE.
+###
+##########
+
+
+class FuseArgs(SubOptsHive):
+    """
+    Class representing a FUSE command line.
+    """
+
+    fuse_modifiers = {'showhelp': '-ho',
+                      'showversion': '-V',
+                      'foreground': '-f'}
+
+    def __init__(self):
+
+        SubOptsHive.__init__(self)
+
+        self.modifiers = {}
+        self.mountpoint = None
+
+        for m in self.fuse_modifiers:
+            self.modifiers[m] = False
+
+    def __str__(self):
+        return '\n'.join(['< on ' + str(self.mountpoint) + ':',
+                          '  ' + str(self.modifiers), '  -o ']) + \
+               ',\n     '.join(self._str_core()) + \
+               ' >'
+
+    def getmod(self, mod):
+        return self.modifiers[mod]
+
+    def setmod(self, mod):
+        self.modifiers[mod] = True
+
+    def unsetmod(self, mod):
+        self.modifiers[mod] = False
+
+    def mount_expected(self):
+        if self.getmod('showhelp'):
+            return False
+        if self.getmod('showversion'):
+            return False
+        return True
+
+    def assemble(self):
+        """Mangle self into an argument array"""
+
+        self.canonify()
+        args = [sys.argv and sys.argv[0] or "python"]
+        if self.mountpoint:
+            args.append(self.mountpoint)
+        for m, v in self.modifiers.iteritems():
+            if v:
+                args.append(self.fuse_modifiers[m])
+
+        opta = []
+        for o, v in self.optdict.iteritems():
+            opta.append(o + '=' + v)
+        opta.extend(self.optlist)
+
+        if opta:
+            args.append("-o" + ",".join(opta))
+
+        return args
+
+    def filter(self, other=None):
+        """
+        Same as for SubOptsHive, with the following difference:
+        if other is not specified, `Fuse.fuseoptref()` is run and its result
+        will be used.
+        """
+
+        if not other:
+            other = Fuse.fuseoptref()
+
+        return SubOptsHive.filter(self, other)
+
+
+class FuseFormatter(SubbedOptIndentedFormatter):
+
+    def __init__(self, **kw):
+        if not 'indent_increment' in kw:
+            kw['indent_increment'] = 4
+        SubbedOptIndentedFormatter.__init__(self, **kw)
+
+    def store_option_strings(self, parser):
+        SubbedOptIndentedFormatter.store_option_strings(self, parser)
+        # 27 is how the lib stock help appears
+        self.help_position = max(self.help_position, 27)
+        self.help_width = self.width - self.help_position
+
+
+class FuseOptParse(SubbedOptParse):
+    """
+    This class alters / enhances `SubbedOptParse` so that it's
+    suitable for usage with FUSE.
+
+    - When adding options, you can use the `mountopt` pseudo-attribute which
+      is equivalent with adding a subopt for option ``-o``
+      (it doesn't require an option argument).
+
+    - FUSE compatible help and version printing.
+
+    - Error and exit callbacks are relaxed. In case of FUSE, the command
+      line is to be treated as a DSL [#]_. You don't wanna this module to
+      force an exit on you just because you hit a DSL syntax error.
+
+    - Built-in support for conventional FUSE options (``-d``, ``-f`, ``-s``).
+      The way of this can be tuned by keyword arguments, see below.
+
+    .. [#] http://en.wikipedia.org/wiki/Domain-specific_programming_language
+
+    Keyword arguments for initialization
+    ------------------------------------
+
+    standard_mods
+      Boolean [default is `True`].
+      Enables support for the usual interpretation of the ``-d``, ``-f``
+      options.
+
+    fetch_mp
+      Boolean [default is `True`].
+      If it's True, then the last (non-option) argument
+      (if there is such a thing) will be used as the FUSE mountpoint.
+
+    dash_s_do
+      String: ``whine``, ``undef``, or ``setsingle`` [default is ``whine``].
+      The ``-s`` option -- traditionally for asking for single-threadedness --
+      is an oddball: single/multi threadedness of a fuse-py fs doesn't depend
+      on the FUSE command line, we have direct control over it.
+
+      Therefore we have two conflicting principles:
+
+      - *Orthogonality*: option parsing shouldn't affect the backing `Fuse`
+        instance directly, only via its `fuse_args` attribute.
+
+      - *POLS*: behave like other FUSE based fs-es do. The stock FUSE help
+        makes mention of ``-s`` as a single-threadedness setter.
+
+      So, if we follow POLS and implement a conventional ``-s`` option, then
+      we have to go beyond the `fuse_args` attribute and set the respective
+      Fuse attribute directly, hence violating orthogonality.
+
+      We let the fs authors make their choice: ``dash_s_do=undef`` leaves this
+      option unhandled, and the fs author can add a handler as she desires.
+      ``dash_s_do=setsingle`` enables the traditional behaviour.
+
+      Using ``dash_s_do=setsingle`` is not problematic at all, but we want fs
+      authors be aware of the particularity of ``-s``, therefore the default is
+      the ``dash_s_do=whine`` setting which raises an exception for ``-s`` and
+      suggests the user to read this documentation.
+
+    dash_o_handler
+      Argument should be a SubbedOpt instance (created with
+      ``action="store_hive"`` if you want it to be useful).
+      This lets you customize the handler of the ``-o`` option. For example,
+      you can alter or suppress the generic ``-o`` entry in help output.
+    """
+
+    def __init__(self, *args, **kw):
+
+        self.mountopts = []
+
+        self.fuse_args = \
+            'fuse_args' in kw and kw.pop('fuse_args') or FuseArgs()
+        dsd = 'dash_s_do' in kw and kw.pop('dash_s_do') or 'whine'
+        if 'fetch_mp' in kw:
+            self.fetch_mp = bool(kw.pop('fetch_mp'))
+        else:
+            self.fetch_mp = True
+        if 'standard_mods' in kw:
+            smods = bool(kw.pop('standard_mods'))
+        else:
+            smods = True
+        if 'fuse' in kw:
+            self.fuse = kw.pop('fuse')
+        if not 'formatter' in kw:
+            kw['formatter'] = FuseFormatter()
+        doh = 'dash_o_handler' in kw and kw.pop('dash_o_handler')
+
+        SubbedOptParse.__init__(self, *args, **kw)
+
+        if doh:
+            self.add_option(doh)
+        else:
+            self.add_option('-o', action='store_hive',
+                            subopts_hive=self.fuse_args, help="mount options",
+                            metavar="opt,[opt...]")
+
+        if smods:
+            self.add_option('-f', action='callback',
+                            callback=lambda *a: self.fuse_args.setmod('foreground'),
+                            help=SUPPRESS_HELP)
+            self.add_option('-d', action='callback',
+                            callback=lambda *a: self.fuse_args.add('debug'),
+                            help=SUPPRESS_HELP)
+
+        if dsd == 'whine':
+            def dsdcb(option, opt_str, value, parser):
+                raise RuntimeError, """
+
+! If you want the "-s" option to work, pass
+!
+!   dash_s_do='setsingle'
+!
+! to the Fuse constructor. See docstring of the FuseOptParse class for an
+! explanation why is it not set by default.
+"""
+
+        elif dsd == 'setsingle':
+            def dsdcb(option, opt_str, value, parser):
+                self.fuse.multithreaded = False
+
+        elif dsd == 'undef':
+            dsdcb = None
+        else:
+            raise ArgumentError, "key `dash_s_do': uninterpreted value " + str(dsd)
+
+        if dsdcb:
+            self.add_option('-s', action='callback', callback=dsdcb,
+                            help=SUPPRESS_HELP)
+
+    def exit(self, status=0, msg=None):
+        if msg:
+            sys.stderr.write(msg)
+
+    def error(self, msg):
+        SubbedOptParse.error(self, msg)
+        raise OptParseError, msg
+
+    def print_help(self, file=sys.stderr):
+        SubbedOptParse.print_help(self, file)
+        print >> file
+        self.fuse_args.setmod('showhelp')
+
+    def print_version(self, file=sys.stderr):
+        SubbedOptParse.print_version(self, file)
+        self.fuse_args.setmod('showversion')
+
+    def parse_args(self, args=None, values=None):
+        o, a = SubbedOptParse.parse_args(self, args, values)
+        if a and self.fetch_mp:
+            self.fuse_args.mountpoint = a.pop()
+        return o, a
+
+    def add_option(self, *opts, **attrs):
+        if 'mountopt' in attrs:
+            if opts or 'subopt' in attrs:
+                raise OptParseError(
+                    "having options or specifying the `subopt' attribute conflicts with `mountopt' attribute")
+            opts = ('-o',)
+            attrs['subopt'] = attrs.pop('mountopt')
+            if not 'dest' in attrs:
+                attrs['dest'] = attrs['subopt']
+
+        SubbedOptParse.add_option(self, *opts, **attrs)
+
+
+##########
+###
+###  The FUSE interface.
+###
+##########
+
+
+class ErrnoWrapper(object):
+
+    def __init__(self, func):
+        self.func = func
+
+    def __call__(self, *args, **kw):
+        try:
+            return apply(self.func, args, kw)
+        except (IOError, OSError), detail:
+            # Sometimes this is an int, sometimes an instance...
+            if hasattr(detail, "errno"): detail = detail.errno
+            return -detail
+
+
+########### Custom objects for transmitting system structures to FUSE
+
+class FuseStruct(object):
+
+    def __init__(self, **kw):
+        for k in kw:
+            setattr(self, k, kw[k])
+
+
+class Stat(FuseStruct):
+    """
+    Auxiliary class which can be filled up stat attributes.
+    The attributes are undefined by default.
+    """
+
+    def __init__(self, **kw):
+        self.st_mode = None
+        self.st_ino = 0
+        self.st_dev = 0
+        self.st_nlink = None
+        self.st_uid = 0
+        self.st_gid = 0
+        self.st_size = 0
+        self.st_atime = 0
+        self.st_mtime = 0
+        self.st_ctime = 0
+
+        FuseStruct.__init__(self, **kw)
+
+
+class StatVfs(FuseStruct):
+    """
+    Auxiliary class which can be filled up statvfs attributes.
+    The attributes are 0 by default.
+    """
+
+    def __init__(self, **kw):
+        self.f_bsize = 0
+        self.f_frsize = 0
+        self.f_blocks = 0
+        self.f_bfree = 0
+        self.f_bavail = 0
+        self.f_files = 0
+        self.f_ffree = 0
+        self.f_favail = 0
+        self.f_flag = 0
+        self.f_namemax = 0
+
+        FuseStruct.__init__(self, **kw)
+
+
+class Direntry(FuseStruct):
+    """
+    Auxiliary class for carrying directory entry data.
+    Initialized with `name`. Further attributes (each
+    set to 0 as default):
+
+    offset
+        An integer (or long) parameter, used as a bookmark
+        during directory traversal.
+        This needs to be set it you want stateful directory
+        reading.
+
+    type
+       Directory entry type, should be one of the stat type
+       specifiers (stat.S_IFLNK, stat.S_IFBLK, stat.S_IFDIR,
+       stat.S_IFCHR, stat.S_IFREG, stat.S_IFIFO, stat.S_IFSOCK).
+
+    ino
+       Directory entry inode number.
+
+    Note that Python's standard directory reading interface is
+    stateless and provides only names, so the above optional
+    attributes doesn't make sense in that context.
+    """
+
+    def __init__(self, name, **kw):
+        self.name = name
+        self.offset = 0
+        self.type = 0
+        self.ino = 0
+
+        FuseStruct.__init__(self, **kw)
+
+
+class Flock(FuseStruct):
+    """
+    Class for representing flock structures (cf. fcntl(3)).
+
+    It makes sense to give values to the `l_type`, `l_start`,
+    `l_len`, `l_pid` attributes (`l_whence` is not used by
+    FUSE, see ``fuse.h``).
+    """
+
+    def __init__(self, name, **kw):
+        self.l_type = None
+        self.l_start = None
+        self.l_len = None
+        self.l_pid = None
+
+        FuseStruct.__init__(self, **kw)
+
+
+class Timespec(FuseStruct):
+    """
+    Cf. struct timespec in time.h:
+    http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html
+    """
+
+    def __init__(self, **kw):
+        self.tv_sec = None
+        self.tv_nsec = None
+
+        FuseStruct.__init__(self, **kw)
+
+
+class FuseFileInfo(FuseStruct):
+
+    def __init__(self, **kw):
+        self.keep = False
+        self.direct_io = False
+
+        FuseStruct.__init__(self, **kw)
+
+
+########## Interface for requiring certain features from your underlying FUSE library.
+
+def feature_needs(*feas):
+    """
+    Get info about the FUSE API version needed for the support of some features.
+
+    This function takes a variable number of feature patterns.
+
+    A feature pattern is either:
+
+    -  an integer (directly referring to a FUSE API version number)
+    -  a built-in feature specifier string (meaning defined by dictionary)
+    -  a string of the form ``has_foo``, where ``foo`` is a filesystem method
+       (refers to the API version where the method has been introduced)
+    -  a list/tuple of other feature patterns (matches each of its members)
+    -  a regexp (meant to be matched against the builtins plus ``has_foo``
+       patterns; can also be given by a string of the from "re:*")
+    -  a negated regexp (can be given by a string of the form "!re:*")
+
+    If called with no arguments, then the list of builtins is returned, mapped
+    to their meaning.
+
+    Otherwise the function returns the smallest FUSE API version number which
+    has all the matching features.
+
+    Builtin specifiers worth to explicit mention:
+    - ``stateful_files``: you want to use custom filehandles (eg. a file class).
+    - ``*``: you want all features.
+    - while ``has_foo`` makes sense for all filesystem method ``foo``, some
+      of these can be found among the builtins, too (the ones which can be
+      handled by the general rule).
+
+    specifiers like ``has_foo`` refer to requirement that the library knows of
+      the fs method ``foo``.
+    """
+
+    fmap = {'stateful_files': 22,
+            'stateful_dirs': 23,
+            'stateful_io': ('stateful_files', 'stateful_dirs'),
+            'stateful_files_keep_cache': 23,
+            'stateful_files_direct_io': 23,
+            'keep_cache': ('stateful_files_keep_cache',),
+            'direct_io': ('stateful_files_direct_io',),
+            'has_opendir': ('stateful_dirs',),
+            'has_releasedir': ('stateful_dirs',),
+            'has_fsyncdir': ('stateful_dirs',),
+            'has_create': 25,
+            'has_access': 25,
+            'has_fgetattr': 25,
+            'has_ftruncate': 25,
+            'has_fsinit': ('has_init'),
+            'has_fsdestroy': ('has_destroy'),
+            'has_lock': 26,
+            'has_utimens': 26,
+            'has_bmap': 26,
+            'has_init': 23,
+            'has_destroy': 23,
+            '*': '!re:^\*$'}
+
+    if not feas:
+        return fmap
+
+    def resolve(args, maxva):
+
+        for fp in args:
+            if isinstance(fp, int):
+                maxva[0] = max(maxva[0], fp)
+                continue
+            if isinstance(fp, list) or isinstance(fp, tuple):
+                for f in fp:
+                    yield f
+                continue
+            ma = isinstance(fp, str) and re.compile("(!\s*|)re:(.*)").match(fp)
+            if isinstance(fp, type(re.compile(''))) or ma:
+                neg = False
+                if ma:
+                    mag = ma.groups()
+                    fp = re.compile(mag[1])
+                    neg = bool(mag[0])
+                for f in fmap.keys() + ['has_' + a for a in Fuse._attrs]:
+                    if neg != bool(re.search(fp, f)):
+                        yield f
+                continue
+            ma = re.compile("has_(.*)").match(fp)
+            if ma and ma.groups()[0] in Fuse._attrs and not fp in fmap:
+                yield 21
+                continue
+            yield fmap[fp]
+
+    maxva = [0]
+    while feas:
+        feas = set(resolve(feas, maxva))
+
+    return maxva[0]
+
+
+def APIVersion():
+    """Get the API version of your underlying FUSE lib"""
+
+    return FuseAPIVersion()
+
+
+def feature_assert(*feas):
+    """
+    Takes some feature patterns (like in `feature_needs`).
+    Raises a fuse.FuseError if your underlying FUSE lib fails
+    to have some of the matching features.
+
+    (Note: use a ``has_foo`` type feature assertion only if lib support
+    for method ``foo`` is *necessary* for your fs. Don't use this assertion
+    just because your fs implements ``foo``. The usefulness of ``has_foo``
+    is limited by the fact that we can't guarantee that your FUSE kernel
+    module also supports ``foo``.)
+    """
+
+    fav = APIVersion()
+
+    for fea in feas:
+        fn = feature_needs(fea)
+        if fav < fn:
+            raise FuseError(
+                "FUSE API version %d is required for feature `%s' but only %d is available" % \
+                (fn, str(fea), fav))
+
+
+############# Subclass this.
+
+class Fuse(object):
+    """
+    Python interface to FUSE.
+
+    Basic usage:
+
+    - instantiate
+
+    - add options to `parser` attribute (an instance of `FuseOptParse`)
+
+    - call `parse`
+
+    - call `main`
+    """
+
+    _attrs = ['getattr', 'readlink', 'readdir', 'mknod', 'mkdir',
+              'unlink', 'rmdir', 'symlink', 'rename', 'link', 'chmod',
+              'chown', 'truncate', 'utime', 'open', 'read', 'write', 'release',
+              'statfs', 'fsync', 'create', 'opendir', 'releasedir', 'fsyncdir',
+              'flush', 'fgetattr', 'ftruncate', 'getxattr', 'listxattr',
+              'setxattr', 'removexattr', 'access', 'lock', 'utimens', 'bmap',
+              'fsinit', 'fsdestroy']
+
+    fusage = "%prog [mountpoint] [options]"
+
+    def __init__(self, *args, **kw):
+        """
+        Not much happens here apart from initializing the `parser` attribute.
+        Arguments are forwarded to the constructor of the parser class almost
+        unchanged.
+
+        The parser class is `FuseOptParse` unless you specify one using the
+        ``parser_class`` keyword. (See `FuseOptParse` documentation for
+        available options.)
+        """
+
+        if not fuse_python_api:
+            raise RuntimeError, __name__ + """.fuse_python_api not defined.
+
+! Please define """ + __name__ + """.fuse_python_api internally (eg.
+! 
+! (1)  """ + __name__ + """.fuse_python_api = """ + `FUSE_PYTHON_API_VERSION` + """
+! 
+! ) or in the enviroment (eg. 
+! 
+! (2)  FUSE_PYTHON_API=0.1
+! 
+! ).
+!
+! If you are actually developing a filesystem, probably (1) is the way to go.
+! If you are using a filesystem written before 2007 Q2, probably (2) is what
+! you want."
+"""
+
+        def malformed():
+            raise RuntimeError, \
+                "malformatted fuse_python_api value " + `fuse_python_api`
+
+        if not isinstance(fuse_python_api, tuple):
+            malformed()
+        for i in fuse_python_api:
+            if not isinstance(i, int) or i < 0:
+                malformed()
+
+        if fuse_python_api > FUSE_PYTHON_API_VERSION:
+            raise RuntimeError, """
+! You require FUSE-Python API version """ + `fuse_python_api` + """.
+! However, the latest available is """ + `FUSE_PYTHON_API_VERSION` + """.
+"""
+
+        self.fuse_args = \
+            'fuse_args' in kw and kw.pop('fuse_args') or FuseArgs()
+
+        if get_compat_0_1():
+            return self.__init_0_1__(*args, **kw)
+
+        self.multithreaded = True
+
+        if not 'usage' in kw:
+            kw['usage'] = self.fusage
+        if not 'fuse_args' in kw:
+            kw['fuse_args'] = self.fuse_args
+        kw['fuse'] = self
+        parserclass = \
+            'parser_class' in kw and kw.pop('parser_class') or FuseOptParse
+
+        self.parser = parserclass(*args, **kw)
+        self.methproxy = self.Methproxy()
+
+    def parse(self, *args, **kw):
+        """Parse command line, fill `fuse_args` attribute."""
+
+        ev = 'errex' in kw and kw.pop('errex')
+        if ev and not isinstance(ev, int):
+            raise TypeError, "error exit value should be an integer"
+
+        try:
+            self.cmdline = self.parser.parse_args(*args, **kw)
+        except OptParseError:
+            if ev:
+                sys.exit(ev)
+            raise
+
+        return self.fuse_args
+
+    def main(self, args=None):
+        """Enter filesystem service loop."""
+
+        if get_compat_0_1():
+            args = self.main_0_1_preamble()
+
+        d = {'multithreaded': self.multithreaded and 1 or 0}
+        d['fuse_args'] = args or self.fuse_args.assemble()
+
+        for t in 'file_class', 'dir_class':
+            if hasattr(self, t):
+                getattr(self.methproxy, 'set_' + t)(getattr(self, t))
+
+        for a in self._attrs:
+            b = a
+            if get_compat_0_1() and a in self.compatmap:
+                b = self.compatmap[a]
+            if hasattr(self, b):
+                c = ''
+                if get_compat_0_1() and hasattr(self, a + '_compat_0_1'):
+                    c = '_compat_0_1'
+                d[a] = ErrnoWrapper(self.lowwrap(a + c))
+
+        try:
+            main(**d)
+        except FuseError:
+            if args or self.fuse_args.mount_expected():
+                raise
+
+    def lowwrap(self, fname):
+        """
+        Wraps the fname method when the C code expects a different kind of
+        callback than we have in the fusepy API. (The wrapper is usually for
+        performing some checks or transfromations which could be done in C but
+        is simpler if done in Python.)
+
+        Currently `open` and `create` are wrapped: a boolean flag is added
+        which indicates if the result is to be kept during the opened file's
+        lifetime or can be thrown away. Namely, it's considered disposable
+        if it's an instance of FuseFileInfo.
+        """
+        fun = getattr(self, fname)
+
+        if fname in ('open', 'create'):
+            def wrap(*a, **kw):
+                res = fun(*a, **kw)
+                if not res or type(res) == type(0):
+                    return res
+                else:
+                    return (res, type(res) != FuseFileInfo)
+        elif fname == 'utimens':
+            def wrap(path, acc_sec, acc_nsec, mod_sec, mod_nsec):
+                ts_acc = Timespec(tv_sec=acc_sec, tv_nsec=acc_nsec)
+                ts_mod = Timespec(tv_sec=mod_sec, tv_nsec=mod_nsec)
+                return fun(path, ts_acc, ts_mod)
+        else:
+            wrap = fun
+
+        return wrap
+
+    def GetContext(self):
+        return FuseGetContext(self)
+
+    def Invalidate(self, path):
+        return FuseInvalidate(self, path)
+
+    def fuseoptref(cls):
+        """
+        Find out which options are recognized by the library.
+        Result is a `FuseArgs` instance with the list of supported
+        options, suitable for passing on to the `filter` method of
+        another `FuseArgs` instance.
+        """
+
+        import os, re
+
+        pr, pw = os.pipe()
+        pid = os.fork()
+        if pid == 0:
+            os.dup2(pw, 2)
+            os.close(pr)
+
+            fh = cls()
+            fh.fuse_args = FuseArgs()
+            fh.fuse_args.setmod('showhelp')
+            fh.main()
+            sys.exit()
+
+        os.close(pw)
+
+        fa = FuseArgs()
+        ore = re.compile("-o\s+([\w\[\]]+(?:=\w+)?)")
+        fpr = os.fdopen(pr)
+        for l in fpr:
+            m = ore.search(l)
+            if m:
+                o = m.groups()[0]
+                oa = [o]
+                # try to catch two-in-one options (like "[no]foo")
+                opa = o.split("[")
+                if len(opa) == 2:
+                    o1, ox = opa
+                    oxpa = ox.split("]")
+                    if len(oxpa) == 2:
+                        oo, o2 = oxpa
+                        oa = [o1 + o2, o1 + oo + o2]
+                for o in oa:
+                    fa.add(o)
+
+        fpr.close()
+        return fa
+
+    fuseoptref = classmethod(fuseoptref)
+
+    class Methproxy(object):
+
+        def __init__(self):
+
+            class mpx(object):
+                def __init__(self, name):
+                    self.name = name
+
+                def __call__(self, *a, **kw):
+                    return getattr(a[-1], self.name)(*(a[1:-1]), **kw)
+
+            self.proxyclass = mpx
+            self.mdic = {}
+            self.file_class = None
+            self.dir_class = None
+
+        def __call__(self, meth):
+            return meth in self.mdic and self.mdic[meth] or None
+
+        def _add_class_type(cls, type, inits, proxied):
+
+            def setter(self, xcls):
+
+                setattr(self, type + '_class', xcls)
+
+                for m in inits:
+                    self.mdic[m] = xcls
+
+                for m in proxied:
+                    if hasattr(xcls, m):
+                        self.mdic[m] = self.proxyclass(m)
+
+            setattr(cls, 'set_' + type + '_class', setter)
+
+        _add_class_type = classmethod(_add_class_type)
+
+    Methproxy._add_class_type('file', ('open', 'create'),
+                              ('read', 'write', 'fsync', 'release', 'flush',
+                               'fgetattr', 'ftruncate', 'lock'))
+    Methproxy._add_class_type('dir', ('opendir',),
+                              ('readdir', 'fsyncdir', 'releasedir'))
+
+    def __getattr__(self, meth):
+
+        m = self.methproxy(meth)
+        if m:
+            return m
+
+        raise AttributeError, "Fuse instance has no attribute '%s'" % meth
+
+    ##########
+    ###
+    ###  Compat stuff.
+    ###
+    ##########
+
+    def __init_0_1__(self, *args, **kw):
+
+        self.flags = 0
+        multithreaded = 0
+
+        # default attributes
+        if args == ():
+            # there is a self.optlist.append() later on, make sure it won't
+            # bomb out.
+            self.optlist = []
+        else:
+            self.optlist = args
+        self.optdict = kw
+
+        if len(self.optlist) == 1:
+            self.mountpoint = self.optlist[0]
+        else:
+            self.mountpoint = None
+
+        # grab command-line arguments, if any.
+        # Those will override whatever parameters
+        # were passed to __init__ directly.
+        argv = sys.argv
+        argc = len(argv)
+        if argc > 1:
+            # we've been given the mountpoint
+            self.mountpoint = argv[1]
+        if argc > 2:
+            # we've received mount args
+            optstr = argv[2]
+            opts = optstr.split(",")
+            for o in opts:
+                try:
+                    k, v = o.split("=", 1)
+                    self.optdict[k] = v
+                except:
+                    self.optlist.append(o)
+
+    def main_0_1_preamble(self):
+
+        cfargs = FuseArgs()
+
+        cfargs.mountpoint = self.mountpoint
+
+        if hasattr(self, 'debug'):
+            cfargs.add('debug')
+
+        if hasattr(self, 'allow_other'):
+            cfargs.add('allow_other')
+
+        if hasattr(self, 'kernel_cache'):
+            cfargs.add('kernel_cache')
+
+        return cfargs.assemble()
+
+    def getattr_compat_0_1(self, *a):
+        from os import stat_result
+
+        return stat_result(self.getattr(*a))
+
+    def statfs_compat_0_1(self, *a):
+
+        oout = self.statfs(*a)
+        lo = len(oout)
+
+        svf = StatVfs()
+        svf.f_bsize = oout[0]  # 0
+        svf.f_frsize = oout[lo >= 8 and 7 or 0]  # 1
+        svf.f_blocks = oout[1]  # 2
+        svf.f_bfree = oout[2]  # 3
+        svf.f_bavail = oout[3]  # 4
+        svf.f_files = oout[4]  # 5
+        svf.f_ffree = oout[5]  # 6
+        svf.f_favail = lo >= 9 and oout[8] or 0  # 7
+        svf.f_flag = lo >= 10 and oout[9] or 0  # 8
+        svf.f_namemax = oout[6]  # 9
+
+        return svf
+
+    def readdir_compat_0_1(self, path, offset, *fh):
+
+        for name, type in self.getdir(path):
+            de = Direntry(name)
+            de.type = type
+
+            yield de
+
+    compatmap = {'readdir': 'getdir'}
--- a/dedupfs/get_memory_usage.py
+++ b/dedupfs/get_memory_usage.py
@ -0,0 +1,47 @@
+#!/usr/bin/python
+
+"""
+The function in this Python module determines the current memory usage of the
+current process by reading the VmSize value from /proc/$pid/status. It's based
+on the following entry in the Python cookbook:
+http://code.activestate.com/recipes/286222/
+"""
+
+import os
+
+_units = { 'KB': 1024, 'MB': 1024**2, 'GB': 1024**3 }
+_handle = _handle = open('/proc/%d/status' % os.getpid())
+
+def get_memory_usage():
+  global _proc_status, _units, _handle
+  try:
+    for line in _handle:
+      if line.startswith('VmSize:'):
+        label, count, unit = line.split()
+        return int(count) * _units[unit.upper()]
+  except:
+    return 0
+  finally:
+    _handle.seek(0)
+
+if __name__ == '__main__':
+  from my_formats import format_size
+  megabyte = 1024**2
+  counter = megabyte
+  limit = megabyte * 50
+  memory = []
+  old_memory_usage = get_memory_usage()
+  assert old_memory_usage > 0
+  while counter < limit:
+    memory.append('a' * counter)
+    msg = "I've just allocated %s and get_memory_usage() returns %s (%s more, deviation is %s)"
+    new_memory_usage = get_memory_usage()
+    difference = new_memory_usage - old_memory_usage
+    deviation = max(difference, counter) - min(difference, counter)
+    assert deviation < 1024*100
+    print msg % (format_size(counter), format_size(new_memory_usage), format_size(difference), format_size(deviation))
+    old_memory_usage = new_memory_usage
+    counter += megabyte
+  print "Stopped allocating new strings at %s" % format_size(limit)
+
+# vim: ts=2 sw=2 et
--- a/dedupfs/lzo/README
+++ b/dedupfs/lzo/README
@ -0,0 +1,8 @@
+This is a simple binding to the canonical LZO compression library. To compile
+first make sure you have both liblzo2 and the headers installed (on Ubuntu you
+can install the packages `liblzo2' and `liblzo2-dev'), then run the command
+`python setup.py build && python setup.py install'.
+
+Please note that this is my first Python/C interfacing code so be careful :-)
+
+ - Peter Odding <peter@peterodding.com>
--- a/dedupfs/lzo/lzomodule.c
+++ b/dedupfs/lzo/lzomodule.c
@ -0,0 +1,149 @@
+#define BLOCK_SIZE (1024 * 128)
+
+#include <python2.6/Python.h>
+#include <lzo/lzo1x.h>
+#include <stdio.h>
+
+/* The following formula gives the worst possible compressed size. */
+#define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3)
+
+/* The size of and pointer to the shared buffer. */
+static int block_size, buffer_size = 0;
+static unsigned char *shared_buffer = NULL;
+
+/* Don't store the size of compressed blocks in headers and trust the user to
+ * configure the correct block size? */
+static int omit_headers = 0;
+
+/* Working memory required by LZO library, allocated on first use. */
+static char *working_memory = NULL;
+
+#define ADD_SIZE(p) (omit_headers ? (p) : ((p) + sizeof(int)))
+#define SUB_SIZE(p) (omit_headers ? (p) : ((p) - sizeof(int)))
+
+static unsigned char *
+get_buffer(int length)
+{
+  if (omit_headers) {
+    if (!shared_buffer)
+      shared_buffer = malloc(buffer_size);
+  } else if (!shared_buffer || buffer_size < length) {
+    free(shared_buffer);
+    shared_buffer = malloc(length);
+    buffer_size = length;
+  }
+  return shared_buffer;
+}
+
+static PyObject *
+set_block_size(PyObject *self, PyObject *args)
+{
+  int new_block_size;
+
+  if (PyArg_ParseTuple(args, "i", &new_block_size)) {
+    if (shared_buffer)
+      free(shared_buffer);
+    block_size = new_block_size;
+    buffer_size = lzo1x_worst_compress(block_size);
+    shared_buffer = malloc(buffer_size);
+    omit_headers = 1;
+  }
+
+  Py_INCREF(Py_True);
+  return Py_True;
+}
+
+static PyObject *
+lzo_compress(PyObject *self, PyObject *args)
+{
+  const unsigned char *input;
+  unsigned char *output;
+  unsigned int inlen, status;
+  lzo_uint outlen;
+
+  /* Get the uncompressed string and its length. */
+  if (!PyArg_ParseTuple(args, "s#", &input, &inlen))
+    return NULL;
+
+  /* Make sure never to touch unallocated memory. */
+  if (omit_headers && inlen > block_size)
+    return PyErr_Format(PyExc_ValueError, "The given input of %i bytes is larger than the configured block size of %i bytes!", block_size, inlen);
+
+  /* Allocate the working memory on first use? */
+  if (!working_memory && !(working_memory = malloc(LZO1X_999_MEM_COMPRESS)))
+    return PyErr_NoMemory();
+
+  /* Allocate the output buffer. */
+  outlen = lzo1x_worst_compress(inlen);
+  output = get_buffer(ADD_SIZE(outlen));
+  if (!output)
+    return PyErr_NoMemory();
+
+  /* Store the input size in the header of the compressed block? */
+  if (!omit_headers)
+    *((int*)output) = inlen;
+
+  /* Compress the input string. The default LZO compression function is
+   * lzo1x_1_compress(). There's also variants like lzo1x_1_15_compress() which
+   * is faster and lzo1x_999_compress() which achieves higher compression. */
+  status = lzo1x_1_15_compress(input, inlen, ADD_SIZE(output), &outlen, working_memory);
+  if (status != LZO_E_OK)
+    return PyErr_Format(PyExc_Exception, "lzo_compress() failed with error code %i!", status);
+
+  /* Return the compressed string. */
+  return Py_BuildValue("s#", output, ADD_SIZE(outlen));
+}
+
+static PyObject *
+lzo_decompress(PyObject *self, PyObject *args)
+{
+  const unsigned char *input;
+  unsigned char *output;
+  int inlen, outlen_expected = 0, status;
+  lzo_uint outlen_actual;
+
+  /* Get the compressed string and its length. */
+  if (!PyArg_ParseTuple(args, "s#", &input, &inlen))
+    return NULL;
+
+  /* Get the length of the uncompressed string? */
+  if (!omit_headers)
+    outlen_expected = *((int*)input);
+
+  /* Allocate the output buffer. */
+  output = get_buffer(outlen_expected);
+  if (!output)
+    return PyErr_NoMemory();
+
+  /* Decompress the compressed string. */
+  status = lzo1x_decompress(ADD_SIZE(input), SUB_SIZE(inlen), output, &outlen_actual, NULL);
+  if (status != LZO_E_OK)
+    return PyErr_Format(PyExc_Exception, "lzo_decompress() failed with error code %i!", status);
+
+  /* Verify the length of the uncompressed data? */
+  if (!omit_headers && outlen_expected != outlen_actual)
+    return PyErr_Format(PyExc_Exception, "The expected length (%i) doesn't match the actual uncompressed length (%i)!", outlen_expected, (int)outlen_actual);
+
+  /* Return the decompressed string. */
+  return Py_BuildValue("s#", output, (int)outlen_actual);
+}
+
+static PyMethodDef functions[] = {
+  { "compress", lzo_compress, METH_VARARGS, "Compress a string using the LZO algorithm." },
+  { "decompress", lzo_decompress, METH_VARARGS, "Decompress a string that was previously compressed using the compress() function of this same module." },
+  { "set_block_size", set_block_size, METH_VARARGS, "Set the max. length of the strings you will be compressing and/or decompressing so that the LZO module can allocate a single buffer shared by all lzo.compress() and lzo.decompress() calls." },
+  { NULL, NULL, 0, NULL }
+};
+
+PyMODINIT_FUNC
+initlzo(void)
+{
+  int status;
+
+  if ((status = lzo_init()) != LZO_E_OK)
+    PyErr_Format(PyExc_Exception, "Failed to initialize the LZO library! (lzo_init() failed with error code %i)", status);
+  else if (!Py_InitModule("lzo", functions))
+    PyErr_Format(PyExc_Exception, "Failed to register module functions!");
+}
+
+/* vim: set ts=2 sw=2 et : */
--- a/dedupfs/lzo/setup.py
+++ b/dedupfs/lzo/setup.py
@ -0,0 +1,4 @@
+from distutils.core import setup, Extension
+
+setup(name = "LZO", version = "1.0",
+      ext_modules = [Extension("lzo", ["lzomodule.c"], libraries=['lzo2'])])
--- a/dedupfs/my_formats.py
+++ b/dedupfs/my_formats.py
@ -0,0 +1,38 @@
+from math import floor
+
+def format_timespan(seconds): # {{{1
+  """
+  Format a timespan in seconds as a human-readable string.
+  """
+  result = []
+  units = [('day', 60 * 60 * 24), ('hour', 60 * 60), ('minute', 60), ('second', 1)]
+  for name, size in units:
+    if seconds >= size:
+      count = seconds / size
+      seconds %= size
+      result.append('%i %s%s' % (count, name, floor(count) != 1 and 's' or ''))
+  if result == []:
+    return 'less than a second'
+  if len(result) == 1:
+    return result[0]
+  else:
+    return ', '.join(result[:-1]) + ' and ' + result[-1]
+
+def format_size(nbytes):
+  """
+  Format a byte count as a human-readable file size.
+  """
+  return nbytes < 1024 and '%i bytes' % nbytes \
+      or nbytes < (1024 ** 2) and __round(nbytes, 1024, 'KB') \
+      or nbytes < (1024 ** 3) and __round(nbytes, 1024 ** 2, 'MB') \
+      or nbytes < (1024 ** 4) and __round(nbytes, 1024 ** 3, 'GB') \
+      or __round(nbytes, 1024 ** 4, 'TB')
+
+def __round(nbytes, divisor, suffix):
+  nbytes = float(nbytes) / divisor
+  if floor(nbytes) == nbytes:
+    return str(int(nbytes)) + ' ' + suffix
+  else:
+    return '%.2f %s' % (nbytes, suffix)
+
+# vim: sw=2 sw=2 et
--- a/dedupfs/tests.sh
+++ b/dedupfs/tests.sh
@ -0,0 +1,242 @@
+#!/bin/bash
+
+TIMESTAMP="`date +%s`"
+ROOTDIR="/tmp/dedupfs-tests-$TIMESTAMP"
+MOUNTPOINT="$ROOTDIR/mountpoint"
+METASTORE="$ROOTDIR/metastore.sqlite3"
+DATASTORE="$ROOTDIR/datastore.db"
+WAITTIME=1
+TESTNO=1
+
+# Initialization. {{{1
+
+FAIL () {
+  FAIL_INTERNAL "$@"
+  exit 1
+}
+
+MESSAGE () {
+  tput bold
+  echo "$@" >&2
+  tput sgr0
+}
+
+FAIL_INTERNAL () {
+  echo -ne '\033[31m' >&2
+  MESSAGE "$@"
+  echo -ne '\033[0m' >&2
+}
+
+CLEANUP () {
+  DO_UNMOUNT
+  if ! rm -R "$ROOTDIR"; then
+    FAIL_INTERNAL "$0:$LINENO: Failed to delete temporary directory!"
+  fi
+}
+
+# Create the root and mount directories.
+mkdir -p "$MOUNTPOINT"
+if [ ! -d "$MOUNTPOINT" ]; then
+  FAIL "$0:$LINENO: Failed to create mount directory $MOUNTPOINT!"
+  exit 1
+fi
+
+DO_MOUNT () {
+  # Mount the file system using the two temporary databases.
+  python dedupfs.py -fv "$@" "--metastore=$METASTORE" "--datastore=$DATASTORE" "$MOUNTPOINT" &
+  # Wait a while before accessing the mount point, to
+  # make sure the file system has been fully initialized.
+  while true; do
+    sleep $WAITTIME
+    if mount | grep -q "$MOUNTPOINT"; then break; fi
+  done
+}
+
+DO_UNMOUNT () {
+  if mount | grep -q "$MOUNTPOINT"; then
+    sleep $WAITTIME
+    if ! fusermount -u "$MOUNTPOINT"; then
+      FAIL_INTERNAL "$0:$LINENO: Failed to unmount the mount point?!"
+    fi
+    while true; do
+      sleep $WAITTIME
+      if ! mount | grep -q "$MOUNTPOINT"; then break; fi
+    done
+  fi
+}
+
+DO_MOUNT --verify-writes --compress=lzo
+
+# Tests 1-8: Test hard link counts with mkdir(), rmdir() and rename(). {{{1
+
+CHECK_NLINK () {
+  NLINK=`ls -ld "$1" | awk '{print $2}'`
+  [ $NLINK -eq $2 ] || FAIL "$0:$3: Expected link count of $1 to be $2, got $NLINK!"
+}
+
+FEEDBACK () {
+  MESSAGE "Running test $1"
+}
+
+# Test 1: Check link count of file system root. {{{2
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
+
+# Test 2: Check link count of newly created file. {{{2
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+FILE="$MOUNTPOINT/file_nlink_test"
+touch "$FILE"
+CHECK_NLINK "$FILE" 1 $LINENO
+CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
+
+# Test 3: Check link count of hard link to existing file. {{{2
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+
+LINK="$MOUNTPOINT/link_to_file"
+link "$FILE" "$LINK"
+CHECK_NLINK "$FILE" 2 $LINENO
+CHECK_NLINK "$LINK" 2 $LINENO
+CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
+unlink "$LINK"
+CHECK_NLINK "$FILE" 2 $LINENO
+CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
+
+# Test 4: Check link count of newly created subdirectory. {{{2
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+
+SUBDIR="$MOUNTPOINT/dir1"
+mkdir "$SUBDIR"
+if [ ! -d "$SUBDIR" ]; then
+  FAIL "$0:$LINENO: Failed to create subdirectory $SUBDIR!"
+fi
+
+CHECK_NLINK "$SUBDIR" 2 $LINENO
+
+# Test 5: Check that nlink of root is incremented by one (because of subdirectory created above). {{{2
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+
+CHECK_NLINK "$MOUNTPOINT" 3 $LINENO
+
+# Test 6: Check that non-empty directories cannot be removed with rmdir(). {{{2
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+
+SUBFILE="$SUBDIR/file"
+touch "$SUBFILE"
+if rmdir "$SUBDIR" 2>/dev/null; then
+  FAIL "$0:$LINENO: rmdir() didn't fail when deleting a non-empty directory!"
+elif ! rm -R "$SUBDIR"; then
+  FAIL "$0:$LINENO: Failed to recursively delete directory?!"
+fi
+
+# Test 7: Check that link count of root is decremented by one (because of subdirectory deleted above). {{{2
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+
+CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
+
+# Test 8: Check that link counts are updated when directories are renamed. {{{2
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+
+ORIGDIR="$MOUNTPOINT/original-directory"
+REPLDIR="$MOUNTPOINT/replacement-directory"
+mkdir  -p "$ORIGDIR/subdir" "$REPLDIR/subdir"
+for DIRNAME in "$ORIGDIR" "$REPLDIR"; do CHECK_NLINK "$DIRNAME" 3 $LINENO; done
+mv -T "$ORIGDIR/subdir" "$REPLDIR/subdir"
+CHECK_NLINK "$ORIGDIR" 2 $LINENO
+CHECK_NLINK "$REPLDIR" 3 $LINENO
+
+# Tests 9-14: Write random binary data to file system and verify that it reads back unchanged. {{{1
+
+TESTDATA="$ROOTDIR/testdata"
+
+WRITE_TESTNO=0
+while [ $WRITE_TESTNO -le 5 ]; do
+  FEEDBACK $TESTNO
+  TESTNO=$[$TESTNO + 1]
+  NBYTES=$[$RANDOM % (1024 * 257)]
+  head -c $NBYTES /dev/urandom > "$TESTDATA"
+  WRITE_FILE="$MOUNTPOINT/$RANDOM"
+  cp -a "$TESTDATA" "$WRITE_FILE"
+  sleep $WAITTIME
+  if ! cmp -s "$TESTDATA" "$WRITE_FILE"; then
+    (sleep 1
+     echo "Differences:"
+     ls -l "$TESTDATA" "$WRITE_FILE"
+     cmp -lb "$TESTDATA" "$WRITE_FILE") &
+    FAIL "$0:$LINENO: Failed to verify $WRITE_FILE of $NBYTES bytes!"
+  fi
+  WRITE_TESTNO=$[$WRITE_TESTNO + 1]
+done
+
+# Test 15: Verify that written data persists when remounted. {{{1
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+
+DO_UNMOUNT
+DO_MOUNT --nogc # <- important for the following tests.
+if ! cmp -s "$TESTDATA" "$WRITE_FILE"; then
+  (sleep 1
+   echo "Differences:"
+   ls -l "$TESTDATA" "$WRITE_FILE"
+   cmp -lb "$TESTDATA" "$WRITE_FILE") &
+  FAIL "$0:$LINENO: Failed to verify $WRITE_FILE of $NBYTES bytes!"
+fi
+
+# Test 16: Verify that garbage collection of unused data blocks works. {{{1
+
+FEEDBACK $TESTNO
+TESTNO=$[$TESTNO + 1]
+
+DO_UNMOUNT
+FULL_SIZE=`ls -l "$DATASTORE" | awk '{print $5}'`
+HALF_SIZE=$[$FULL_SIZE / 2]
+DO_MOUNT
+rm $MOUNTPOINT/* 2>/dev/null
+DO_UNMOUNT
+REDUCED_SIZE=`ls -l "$DATASTORE" | awk '{print $5}'`
+[ $REDUCED_SIZE -lt $HALF_SIZE ] || FAIL "$0:$LINENO: Failed to verify effectiveness of data block garbage collection! (Full size of data store: $FULL_SIZE, reduced size: $REDUCED_SIZE)"
+
+# Test 17: Verify that garbage collection of interned path segments works. {{{1
+
+DO_MOUNT --nosync
+SEGMENTGCDIR="$MOUNTPOINT/gc-of-segments-test"
+mkdir "$SEGMENTGCDIR"
+for ((i=0;i<=512;i+=1)); do
+  echo -ne "\rCreating segment $i"
+  touch "$SEGMENTGCDIR/$i"
+done
+echo -ne "\rSyncing to disk using unmount"
+DO_UNMOUNT
+FULL_SIZE=`ls -l "$METASTORE" | awk '{print $5}'`
+HALF_SIZE=$[$FULL_SIZE / 2]
+echo -ne "\rDeleting segments"
+DO_MOUNT --nosync
+rm -R "$SEGMENTGCDIR"
+echo -ne "\rSyncing to disk using unmount"
+DO_UNMOUNT
+REDUCED_SIZE=`ls -l "$METASTORE" | awk '{print $5}'`
+[ $REDUCED_SIZE -lt $HALF_SIZE ] || FAIL "$0:$LINENO: Failed to verify effectiveness of interned string garbage collection! (Full size of metadata store: $FULL_SIZE, reduced size: $REDUCED_SIZE)"
+echo -ne "\r"
+
+# Finalization. {{{1
+
+CLEANUP
+MESSAGE "All tests passed!"
+
+# vim: ts=2 sw=2 et
				`@ -1 +0,0 @@`
				`Subproject commit 78f0814a6f5e43915e0512273d8b26e87b3ae353`