1
0
Fork 0
mirror of https://github.com/SlavikMIPT/tgcloud.git synced 2025-02-12 11:12:09 +00:00

Merge branch 'develop' of https://github.com/SlavikMIPT/tgcloud into develop

This commit is contained in:
Вячеслав Баженов 2019-06-16 18:39:03 +03:00
commit 37c2d3c02e
16 changed files with 2915 additions and 8 deletions

6
.gitmodules vendored
View file

@ -1,6 +0,0 @@
[submodule "filebrowser"]
path = filebrowser
url = https://github.com/SlavikMIPT/filebrowser
[submodule "dedupfs"]
path = dedupfs
url = https://github.com/xolox/dedupfs.git

View file

@ -1 +1,38 @@
# tgloader
# tgcloud
## UNDER DEVELOPMENT
## Opensourсe Virtual Filesystem for Telegram
Synchronizes and structures files downloaded to Telegram.
- Stores only metadata, accessing raw data only when loading files.
- Loading speed is up to 240Mbit/s per session
- Multiplatform: provides standard volumes which can be mounted on linux/win/mac...
- Opensource
### Project structure:
**tgcloud:** linux based docker container
* **redis** - updates, rpc, communication
* **tfs:** FUSE based VFS module
* [python-fuse](https://github.com/SlavikMIPT/tfs) - interface to linux kernel FS
* redis storage - FS struct, meta, telegram file_id,settings
* rq communication interface
* docker
* **file_telegram_rxtx** - telegram read/write driver
* [telethon(sync)](https://github.com/SlavikMIPT/Telethon) by [@Lonami](https://github.com/Lonami) - telegram access, multithreaded downloading/uploading
* improved and tested by [@SlavikMIPT](https://github.com/SlavikMIPT) - load speed 240Mb/s
* rq communication interface
* docker
* **polling daemon**
* [telethon(asyncio)](https://github.com/SlavikMIPT/Telethon) - updates from telegram, synchronization, hashtags
* rq communication interface
* docker
* **client**
* telegram authorization interface
* [filebrowser](https://github.com/SlavikMIPT/filebrowser) - opensource golang filebrowser
* windows service
* telegram desktop client with filebrowser
* settings, statistics, monitoring...
* rq communication interface
* docker
![Diagram](/img/ProjectDiagram.png)
You are welcome to collaborate - contact
Telegram: [@SlavikMIPT](t.me/SlavikMIPT)
Channel: [@MediaTube_stream](t.me/MediaTube_stream)

@ -1 +0,0 @@
Subproject commit 78f0814a6f5e43915e0512273d8b26e87b3ae353

21
dedupfs/LICENSE Normal file
View file

@ -0,0 +1,21 @@
DedupFS is licensed under the MIT license.
Copyright 2010 Peter Odding <peter@peterodding.com>.
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

12
dedupfs/NOTES Normal file
View file

@ -0,0 +1,12 @@
The included fuse.py module includes a single-line bug fix to the fuse.py file
included with Ubuntu's Python 2.6 package for the method Timespec.__init__():
480c480
< def __init__(self, name, **kw):
---
> def __init__(self, **kw):
During initial development I used the following resources:
- http://sf.net/apps/mediawiki/fuse/index.php?title=FUSE_Python_Reference
- http://linux.die.net/man/2/path_resolution
- /usr/include/fuse/fuse.h :-(

42
dedupfs/README.md Normal file
View file

@ -0,0 +1,42 @@
# DedupFS: A deduplicating FUSE file system written in Python
The Python program [dedupfs.py](http://github.com/xolox/dedupfs/blob/master/dedupfs.py) implements a file system in user space using [FUSE](http://en.wikipedia.org/wiki/Filesystem_in_Userspace). It's called DedupFS because the file system's primary feature is [data deduplication](http://en.wikipedia.org/wiki/Data_deduplication), which enables it to store virtually unlimited copies of files because unchanged data is only stored once. In addition to deduplication the file system also supports transparent compression using the compression methods [lzo](http://en.wikipedia.org/wiki/LZO), [zlib](http://en.wikipedia.org/wiki/zlib) and [bz2](http://en.wikipedia.org/wiki/bz2). These properties make the file system ideal for backups: I'm currently storing 250 GB worth of backups using only 8 GB of disk space.
Several aspects of the design of DedupFS were inspired by [Venti](http://en.wikipedia.org/wiki/Venti) (ignoring the distributed aspect, for now…) and [ZFS](http://en.wikipedia.org/wiki/ZFS), though I've never personally used either. The [ArchiveFS](http://code.google.com/p/archivefs/) and [lessfs](http://www.lessfs.com/) projects share similar goals but have very different implementations.
## Usage
The following shell commands show how to install and use the DedupFS file system on [Ubuntu](http://www.ubuntu.com/) (where it was developed):
$ sudo apt-get install python-fuse
$ git clone git://github.com/xolox/dedupfs.git
$ mkdir mount_point
$ python dedupfs/dedupfs.py mount_point
# Now copy some files to mount_point/ and observe that the size of the two
# databases doesn't grow much when you copy duplicate files again :-)
# The two databases are by default stored in the following locations:
# - ~/.dedupfs-metastore.sqlite3 contains the tree and meta data
# - ~/.dedupfs-datastore.db contains the (compressed) data blocks
## Status
Development on DedupFS began as a proof of concept to find out how much disk space the author could free by employing deduplication to store his daily backups. Since then it's become more or less usable as a way to archive old backups, i.e. for secondary storage deduplication. It's not recommended to use the file system for primary storage though, simply because the file system is too slow. I also wouldn't recommend depending on DedupFS just yet, at least until a proper set of automated tests has been written and successfully run to prove the correctness of the code (the tests are being worked on).
The file system initially stored everything in a single [SQLite](http://www.sqlite.org/) database, but it turned out that after the database grew beyond 8 GB the write speed would drop from 8-12 MB/s to 2-3 MB/s. Therefor the file system now stores its data blocks in a separate database, which is a persistent key/value store managed by a [dbm](http://en.wikipedia.org/wiki/dbm) implementation like [gdbm](http://www.gnu.org/software/gdbm/gdbm.html) or [Berkeley DB](http://en.wikipedia.org/wiki/Berkeley_DB).
### Limitations
In the current implementation a file's content needs to fit in a [cStringIO](http://docs.python.org/library/stringio.html#module-cStringIO) instance, which limits the maximum file size to your free RAM. Initially I implemented it this way because I was focusing on backups of web/mail servers, which don't contain files larger than 250 MB. Then I started copying virtual disk images and my file system blew up :-(. I know how to fix this but haven't implemented the change yet.
## Dependencies
DedupFS was developed using Python 2.6, though it might also work on earlier versions. It definitely doesn't work with Python 3 yet though. It requires the [Python FUSE binding](http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FUSE_Python_tutorial) in addition to several Python standard libraries like [anydbm](http://docs.python.org/library/anydbm.html), [sqlite3](http://docs.python.org/library/sqlite3.html), [hashlib](http://docs.python.org/library/hashlib.html) and [cStringIO](http://docs.python.org/library/stringio.html#module-cStringIO).
## Contact
If you have questions, bug reports, suggestions, etc. the author can be contacted at <peter@peterodding.com>. The latest version of DedupFS is available at <http://peterodding.com/code/dedupfs/> and <http://github.com/xolox/dedupfs>.
## License
This software is licensed under the MIT license.
© 2010 Peter Odding &lt;<peter@peterodding.com>&gt;.

36
dedupfs/TODO Normal file
View file

@ -0,0 +1,36 @@
Here are some things on my to-do list, in no particular order:
* Automatically switch to a larger block size to reduce the overhead for files
that rarely change after being created (like >= 100MB video files :-)
* Implement the fsync(datasync) API method?
if datasync:
only flush user data (file contents)
else:
flush user & meta data (file contents & attributes)
* Implement rename() independently of link()/unlink() to improve performance?
* Implement `--verify-reads` option that recalculates hashes when reading to
check for data block corruption?
* `report_disk_usage()` has become way too expensive for regular status
reports because it takes more than a minute on a 7.0 GB database. The only
way it might work was if the statistics are only retrieved from the database
once and from then on kept up to date inside Python, but that seems like an
awful lot of work. For now I've removed the call to `report_disk_usage()`
from `print_stats()` and added a `--print-stats` command-line option that
reports the disk usage and then exits.
* Tag databases with a version number and implement automatic upgrades because
I've grown tired of upgrading my database by hand :-)
* Change the project name because `DedupFS` is already used by at least two
other projects? One is a distributed file system which shouldn't cause too
much confusion, but the other is a deduplicating file system as well :-\
* Support directory hard links without upsetting FUSE and add a command-line
option that instructs `dedupfs.py` to search for identical subdirectories
and replace them with directory hard links.
* Support files that don't fit in RAM (virtual machine disk images…)

0
dedupfs/__init__.py Normal file
View file

1296
dedupfs/dedupfs.py Normal file

File diff suppressed because it is too large Load diff

982
dedupfs/fuse.py Normal file
View file

@ -0,0 +1,982 @@
#
# Copyright (C) 2001 Jeff Epler <jepler@unpythonic.dhs.org>
# Copyright (C) 2006 Csaba Henk <csaba.henk@creo.hu>
#
# This program can be distributed under the terms of the GNU LGPL.
# See the file COPYING.
#
# suppress version mismatch warnings
try:
import warnings
warnings.filterwarnings('ignore',
'Python C API version mismatch',
RuntimeWarning,
)
except:
pass
from string import join
import sys
from errno import *
from os import environ
import re
from fuseparts import __version__
from fuseparts._fuse import main, FuseGetContext, FuseInvalidate
from fuseparts._fuse import FuseError, FuseAPIVersion
from fuseparts.subbedopts import SubOptsHive, SubbedOptFormatter
from fuseparts.subbedopts import SubbedOptIndentedFormatter, SubbedOptParse
from fuseparts.subbedopts import SUPPRESS_HELP, OptParseError
from fuseparts.setcompatwrap import set
##########
###
### API specification API.
###
##########
# The actual API version of this module
FUSE_PYTHON_API_VERSION = (0, 2)
def __getenv__(var, pattern='.', trans=lambda x: x):
"""
Fetch enviroment variable and optionally transform it. Return `None` if
variable is unset. Bail out if value of variable doesn't match (optional)
regex pattern.
"""
if var not in environ:
return None
val = environ[var]
rpat = pattern
if not isinstance(rpat, type(re.compile(''))):
rpat = re.compile(rpat)
if not rpat.search(val):
raise RuntimeError("env var %s doesn't match required pattern %s" % \
(var, `pattern`))
return trans(val)
def get_fuse_python_api():
if fuse_python_api:
return fuse_python_api
elif compat_0_1:
# deprecated way of API specification
return (0, 1)
def get_compat_0_1():
return get_fuse_python_api() == (0, 1)
# API version to be used
fuse_python_api = __getenv__('FUSE_PYTHON_API', '^[\d.]+$',
lambda x: tuple([int(i) for i in x.split('.')]))
# deprecated way of API specification
compat_0_1 = __getenv__('FUSE_PYTHON_COMPAT', '^(0.1|ALL)$', lambda x: True)
fuse_python_api = get_fuse_python_api()
##########
###
### Parsing for FUSE.
###
##########
class FuseArgs(SubOptsHive):
"""
Class representing a FUSE command line.
"""
fuse_modifiers = {'showhelp': '-ho',
'showversion': '-V',
'foreground': '-f'}
def __init__(self):
SubOptsHive.__init__(self)
self.modifiers = {}
self.mountpoint = None
for m in self.fuse_modifiers:
self.modifiers[m] = False
def __str__(self):
return '\n'.join(['< on ' + str(self.mountpoint) + ':',
' ' + str(self.modifiers), ' -o ']) + \
',\n '.join(self._str_core()) + \
' >'
def getmod(self, mod):
return self.modifiers[mod]
def setmod(self, mod):
self.modifiers[mod] = True
def unsetmod(self, mod):
self.modifiers[mod] = False
def mount_expected(self):
if self.getmod('showhelp'):
return False
if self.getmod('showversion'):
return False
return True
def assemble(self):
"""Mangle self into an argument array"""
self.canonify()
args = [sys.argv and sys.argv[0] or "python"]
if self.mountpoint:
args.append(self.mountpoint)
for m, v in self.modifiers.iteritems():
if v:
args.append(self.fuse_modifiers[m])
opta = []
for o, v in self.optdict.iteritems():
opta.append(o + '=' + v)
opta.extend(self.optlist)
if opta:
args.append("-o" + ",".join(opta))
return args
def filter(self, other=None):
"""
Same as for SubOptsHive, with the following difference:
if other is not specified, `Fuse.fuseoptref()` is run and its result
will be used.
"""
if not other:
other = Fuse.fuseoptref()
return SubOptsHive.filter(self, other)
class FuseFormatter(SubbedOptIndentedFormatter):
def __init__(self, **kw):
if not 'indent_increment' in kw:
kw['indent_increment'] = 4
SubbedOptIndentedFormatter.__init__(self, **kw)
def store_option_strings(self, parser):
SubbedOptIndentedFormatter.store_option_strings(self, parser)
# 27 is how the lib stock help appears
self.help_position = max(self.help_position, 27)
self.help_width = self.width - self.help_position
class FuseOptParse(SubbedOptParse):
"""
This class alters / enhances `SubbedOptParse` so that it's
suitable for usage with FUSE.
- When adding options, you can use the `mountopt` pseudo-attribute which
is equivalent with adding a subopt for option ``-o``
(it doesn't require an option argument).
- FUSE compatible help and version printing.
- Error and exit callbacks are relaxed. In case of FUSE, the command
line is to be treated as a DSL [#]_. You don't wanna this module to
force an exit on you just because you hit a DSL syntax error.
- Built-in support for conventional FUSE options (``-d``, ``-f`, ``-s``).
The way of this can be tuned by keyword arguments, see below.
.. [#] http://en.wikipedia.org/wiki/Domain-specific_programming_language
Keyword arguments for initialization
------------------------------------
standard_mods
Boolean [default is `True`].
Enables support for the usual interpretation of the ``-d``, ``-f``
options.
fetch_mp
Boolean [default is `True`].
If it's True, then the last (non-option) argument
(if there is such a thing) will be used as the FUSE mountpoint.
dash_s_do
String: ``whine``, ``undef``, or ``setsingle`` [default is ``whine``].
The ``-s`` option -- traditionally for asking for single-threadedness --
is an oddball: single/multi threadedness of a fuse-py fs doesn't depend
on the FUSE command line, we have direct control over it.
Therefore we have two conflicting principles:
- *Orthogonality*: option parsing shouldn't affect the backing `Fuse`
instance directly, only via its `fuse_args` attribute.
- *POLS*: behave like other FUSE based fs-es do. The stock FUSE help
makes mention of ``-s`` as a single-threadedness setter.
So, if we follow POLS and implement a conventional ``-s`` option, then
we have to go beyond the `fuse_args` attribute and set the respective
Fuse attribute directly, hence violating orthogonality.
We let the fs authors make their choice: ``dash_s_do=undef`` leaves this
option unhandled, and the fs author can add a handler as she desires.
``dash_s_do=setsingle`` enables the traditional behaviour.
Using ``dash_s_do=setsingle`` is not problematic at all, but we want fs
authors be aware of the particularity of ``-s``, therefore the default is
the ``dash_s_do=whine`` setting which raises an exception for ``-s`` and
suggests the user to read this documentation.
dash_o_handler
Argument should be a SubbedOpt instance (created with
``action="store_hive"`` if you want it to be useful).
This lets you customize the handler of the ``-o`` option. For example,
you can alter or suppress the generic ``-o`` entry in help output.
"""
def __init__(self, *args, **kw):
self.mountopts = []
self.fuse_args = \
'fuse_args' in kw and kw.pop('fuse_args') or FuseArgs()
dsd = 'dash_s_do' in kw and kw.pop('dash_s_do') or 'whine'
if 'fetch_mp' in kw:
self.fetch_mp = bool(kw.pop('fetch_mp'))
else:
self.fetch_mp = True
if 'standard_mods' in kw:
smods = bool(kw.pop('standard_mods'))
else:
smods = True
if 'fuse' in kw:
self.fuse = kw.pop('fuse')
if not 'formatter' in kw:
kw['formatter'] = FuseFormatter()
doh = 'dash_o_handler' in kw and kw.pop('dash_o_handler')
SubbedOptParse.__init__(self, *args, **kw)
if doh:
self.add_option(doh)
else:
self.add_option('-o', action='store_hive',
subopts_hive=self.fuse_args, help="mount options",
metavar="opt,[opt...]")
if smods:
self.add_option('-f', action='callback',
callback=lambda *a: self.fuse_args.setmod('foreground'),
help=SUPPRESS_HELP)
self.add_option('-d', action='callback',
callback=lambda *a: self.fuse_args.add('debug'),
help=SUPPRESS_HELP)
if dsd == 'whine':
def dsdcb(option, opt_str, value, parser):
raise RuntimeError, """
! If you want the "-s" option to work, pass
!
! dash_s_do='setsingle'
!
! to the Fuse constructor. See docstring of the FuseOptParse class for an
! explanation why is it not set by default.
"""
elif dsd == 'setsingle':
def dsdcb(option, opt_str, value, parser):
self.fuse.multithreaded = False
elif dsd == 'undef':
dsdcb = None
else:
raise ArgumentError, "key `dash_s_do': uninterpreted value " + str(dsd)
if dsdcb:
self.add_option('-s', action='callback', callback=dsdcb,
help=SUPPRESS_HELP)
def exit(self, status=0, msg=None):
if msg:
sys.stderr.write(msg)
def error(self, msg):
SubbedOptParse.error(self, msg)
raise OptParseError, msg
def print_help(self, file=sys.stderr):
SubbedOptParse.print_help(self, file)
print >> file
self.fuse_args.setmod('showhelp')
def print_version(self, file=sys.stderr):
SubbedOptParse.print_version(self, file)
self.fuse_args.setmod('showversion')
def parse_args(self, args=None, values=None):
o, a = SubbedOptParse.parse_args(self, args, values)
if a and self.fetch_mp:
self.fuse_args.mountpoint = a.pop()
return o, a
def add_option(self, *opts, **attrs):
if 'mountopt' in attrs:
if opts or 'subopt' in attrs:
raise OptParseError(
"having options or specifying the `subopt' attribute conflicts with `mountopt' attribute")
opts = ('-o',)
attrs['subopt'] = attrs.pop('mountopt')
if not 'dest' in attrs:
attrs['dest'] = attrs['subopt']
SubbedOptParse.add_option(self, *opts, **attrs)
##########
###
### The FUSE interface.
###
##########
class ErrnoWrapper(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kw):
try:
return apply(self.func, args, kw)
except (IOError, OSError), detail:
# Sometimes this is an int, sometimes an instance...
if hasattr(detail, "errno"): detail = detail.errno
return -detail
########### Custom objects for transmitting system structures to FUSE
class FuseStruct(object):
def __init__(self, **kw):
for k in kw:
setattr(self, k, kw[k])
class Stat(FuseStruct):
"""
Auxiliary class which can be filled up stat attributes.
The attributes are undefined by default.
"""
def __init__(self, **kw):
self.st_mode = None
self.st_ino = 0
self.st_dev = 0
self.st_nlink = None
self.st_uid = 0
self.st_gid = 0
self.st_size = 0
self.st_atime = 0
self.st_mtime = 0
self.st_ctime = 0
FuseStruct.__init__(self, **kw)
class StatVfs(FuseStruct):
"""
Auxiliary class which can be filled up statvfs attributes.
The attributes are 0 by default.
"""
def __init__(self, **kw):
self.f_bsize = 0
self.f_frsize = 0
self.f_blocks = 0
self.f_bfree = 0
self.f_bavail = 0
self.f_files = 0
self.f_ffree = 0
self.f_favail = 0
self.f_flag = 0
self.f_namemax = 0
FuseStruct.__init__(self, **kw)
class Direntry(FuseStruct):
"""
Auxiliary class for carrying directory entry data.
Initialized with `name`. Further attributes (each
set to 0 as default):
offset
An integer (or long) parameter, used as a bookmark
during directory traversal.
This needs to be set it you want stateful directory
reading.
type
Directory entry type, should be one of the stat type
specifiers (stat.S_IFLNK, stat.S_IFBLK, stat.S_IFDIR,
stat.S_IFCHR, stat.S_IFREG, stat.S_IFIFO, stat.S_IFSOCK).
ino
Directory entry inode number.
Note that Python's standard directory reading interface is
stateless and provides only names, so the above optional
attributes doesn't make sense in that context.
"""
def __init__(self, name, **kw):
self.name = name
self.offset = 0
self.type = 0
self.ino = 0
FuseStruct.__init__(self, **kw)
class Flock(FuseStruct):
"""
Class for representing flock structures (cf. fcntl(3)).
It makes sense to give values to the `l_type`, `l_start`,
`l_len`, `l_pid` attributes (`l_whence` is not used by
FUSE, see ``fuse.h``).
"""
def __init__(self, name, **kw):
self.l_type = None
self.l_start = None
self.l_len = None
self.l_pid = None
FuseStruct.__init__(self, **kw)
class Timespec(FuseStruct):
"""
Cf. struct timespec in time.h:
http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html
"""
def __init__(self, **kw):
self.tv_sec = None
self.tv_nsec = None
FuseStruct.__init__(self, **kw)
class FuseFileInfo(FuseStruct):
def __init__(self, **kw):
self.keep = False
self.direct_io = False
FuseStruct.__init__(self, **kw)
########## Interface for requiring certain features from your underlying FUSE library.
def feature_needs(*feas):
"""
Get info about the FUSE API version needed for the support of some features.
This function takes a variable number of feature patterns.
A feature pattern is either:
- an integer (directly referring to a FUSE API version number)
- a built-in feature specifier string (meaning defined by dictionary)
- a string of the form ``has_foo``, where ``foo`` is a filesystem method
(refers to the API version where the method has been introduced)
- a list/tuple of other feature patterns (matches each of its members)
- a regexp (meant to be matched against the builtins plus ``has_foo``
patterns; can also be given by a string of the from "re:*")
- a negated regexp (can be given by a string of the form "!re:*")
If called with no arguments, then the list of builtins is returned, mapped
to their meaning.
Otherwise the function returns the smallest FUSE API version number which
has all the matching features.
Builtin specifiers worth to explicit mention:
- ``stateful_files``: you want to use custom filehandles (eg. a file class).
- ``*``: you want all features.
- while ``has_foo`` makes sense for all filesystem method ``foo``, some
of these can be found among the builtins, too (the ones which can be
handled by the general rule).
specifiers like ``has_foo`` refer to requirement that the library knows of
the fs method ``foo``.
"""
fmap = {'stateful_files': 22,
'stateful_dirs': 23,
'stateful_io': ('stateful_files', 'stateful_dirs'),
'stateful_files_keep_cache': 23,
'stateful_files_direct_io': 23,
'keep_cache': ('stateful_files_keep_cache',),
'direct_io': ('stateful_files_direct_io',),
'has_opendir': ('stateful_dirs',),
'has_releasedir': ('stateful_dirs',),
'has_fsyncdir': ('stateful_dirs',),
'has_create': 25,
'has_access': 25,
'has_fgetattr': 25,
'has_ftruncate': 25,
'has_fsinit': ('has_init'),
'has_fsdestroy': ('has_destroy'),
'has_lock': 26,
'has_utimens': 26,
'has_bmap': 26,
'has_init': 23,
'has_destroy': 23,
'*': '!re:^\*$'}
if not feas:
return fmap
def resolve(args, maxva):
for fp in args:
if isinstance(fp, int):
maxva[0] = max(maxva[0], fp)
continue
if isinstance(fp, list) or isinstance(fp, tuple):
for f in fp:
yield f
continue
ma = isinstance(fp, str) and re.compile("(!\s*|)re:(.*)").match(fp)
if isinstance(fp, type(re.compile(''))) or ma:
neg = False
if ma:
mag = ma.groups()
fp = re.compile(mag[1])
neg = bool(mag[0])
for f in fmap.keys() + ['has_' + a for a in Fuse._attrs]:
if neg != bool(re.search(fp, f)):
yield f
continue
ma = re.compile("has_(.*)").match(fp)
if ma and ma.groups()[0] in Fuse._attrs and not fp in fmap:
yield 21
continue
yield fmap[fp]
maxva = [0]
while feas:
feas = set(resolve(feas, maxva))
return maxva[0]
def APIVersion():
"""Get the API version of your underlying FUSE lib"""
return FuseAPIVersion()
def feature_assert(*feas):
"""
Takes some feature patterns (like in `feature_needs`).
Raises a fuse.FuseError if your underlying FUSE lib fails
to have some of the matching features.
(Note: use a ``has_foo`` type feature assertion only if lib support
for method ``foo`` is *necessary* for your fs. Don't use this assertion
just because your fs implements ``foo``. The usefulness of ``has_foo``
is limited by the fact that we can't guarantee that your FUSE kernel
module also supports ``foo``.)
"""
fav = APIVersion()
for fea in feas:
fn = feature_needs(fea)
if fav < fn:
raise FuseError(
"FUSE API version %d is required for feature `%s' but only %d is available" % \
(fn, str(fea), fav))
############# Subclass this.
class Fuse(object):
"""
Python interface to FUSE.
Basic usage:
- instantiate
- add options to `parser` attribute (an instance of `FuseOptParse`)
- call `parse`
- call `main`
"""
_attrs = ['getattr', 'readlink', 'readdir', 'mknod', 'mkdir',
'unlink', 'rmdir', 'symlink', 'rename', 'link', 'chmod',
'chown', 'truncate', 'utime', 'open', 'read', 'write', 'release',
'statfs', 'fsync', 'create', 'opendir', 'releasedir', 'fsyncdir',
'flush', 'fgetattr', 'ftruncate', 'getxattr', 'listxattr',
'setxattr', 'removexattr', 'access', 'lock', 'utimens', 'bmap',
'fsinit', 'fsdestroy']
fusage = "%prog [mountpoint] [options]"
def __init__(self, *args, **kw):
"""
Not much happens here apart from initializing the `parser` attribute.
Arguments are forwarded to the constructor of the parser class almost
unchanged.
The parser class is `FuseOptParse` unless you specify one using the
``parser_class`` keyword. (See `FuseOptParse` documentation for
available options.)
"""
if not fuse_python_api:
raise RuntimeError, __name__ + """.fuse_python_api not defined.
! Please define """ + __name__ + """.fuse_python_api internally (eg.
!
! (1) """ + __name__ + """.fuse_python_api = """ + `FUSE_PYTHON_API_VERSION` + """
!
! ) or in the enviroment (eg.
!
! (2) FUSE_PYTHON_API=0.1
!
! ).
!
! If you are actually developing a filesystem, probably (1) is the way to go.
! If you are using a filesystem written before 2007 Q2, probably (2) is what
! you want."
"""
def malformed():
raise RuntimeError, \
"malformatted fuse_python_api value " + `fuse_python_api`
if not isinstance(fuse_python_api, tuple):
malformed()
for i in fuse_python_api:
if not isinstance(i, int) or i < 0:
malformed()
if fuse_python_api > FUSE_PYTHON_API_VERSION:
raise RuntimeError, """
! You require FUSE-Python API version """ + `fuse_python_api` + """.
! However, the latest available is """ + `FUSE_PYTHON_API_VERSION` + """.
"""
self.fuse_args = \
'fuse_args' in kw and kw.pop('fuse_args') or FuseArgs()
if get_compat_0_1():
return self.__init_0_1__(*args, **kw)
self.multithreaded = True
if not 'usage' in kw:
kw['usage'] = self.fusage
if not 'fuse_args' in kw:
kw['fuse_args'] = self.fuse_args
kw['fuse'] = self
parserclass = \
'parser_class' in kw and kw.pop('parser_class') or FuseOptParse
self.parser = parserclass(*args, **kw)
self.methproxy = self.Methproxy()
def parse(self, *args, **kw):
"""Parse command line, fill `fuse_args` attribute."""
ev = 'errex' in kw and kw.pop('errex')
if ev and not isinstance(ev, int):
raise TypeError, "error exit value should be an integer"
try:
self.cmdline = self.parser.parse_args(*args, **kw)
except OptParseError:
if ev:
sys.exit(ev)
raise
return self.fuse_args
def main(self, args=None):
"""Enter filesystem service loop."""
if get_compat_0_1():
args = self.main_0_1_preamble()
d = {'multithreaded': self.multithreaded and 1 or 0}
d['fuse_args'] = args or self.fuse_args.assemble()
for t in 'file_class', 'dir_class':
if hasattr(self, t):
getattr(self.methproxy, 'set_' + t)(getattr(self, t))
for a in self._attrs:
b = a
if get_compat_0_1() and a in self.compatmap:
b = self.compatmap[a]
if hasattr(self, b):
c = ''
if get_compat_0_1() and hasattr(self, a + '_compat_0_1'):
c = '_compat_0_1'
d[a] = ErrnoWrapper(self.lowwrap(a + c))
try:
main(**d)
except FuseError:
if args or self.fuse_args.mount_expected():
raise
def lowwrap(self, fname):
"""
Wraps the fname method when the C code expects a different kind of
callback than we have in the fusepy API. (The wrapper is usually for
performing some checks or transfromations which could be done in C but
is simpler if done in Python.)
Currently `open` and `create` are wrapped: a boolean flag is added
which indicates if the result is to be kept during the opened file's
lifetime or can be thrown away. Namely, it's considered disposable
if it's an instance of FuseFileInfo.
"""
fun = getattr(self, fname)
if fname in ('open', 'create'):
def wrap(*a, **kw):
res = fun(*a, **kw)
if not res or type(res) == type(0):
return res
else:
return (res, type(res) != FuseFileInfo)
elif fname == 'utimens':
def wrap(path, acc_sec, acc_nsec, mod_sec, mod_nsec):
ts_acc = Timespec(tv_sec=acc_sec, tv_nsec=acc_nsec)
ts_mod = Timespec(tv_sec=mod_sec, tv_nsec=mod_nsec)
return fun(path, ts_acc, ts_mod)
else:
wrap = fun
return wrap
def GetContext(self):
return FuseGetContext(self)
def Invalidate(self, path):
return FuseInvalidate(self, path)
def fuseoptref(cls):
"""
Find out which options are recognized by the library.
Result is a `FuseArgs` instance with the list of supported
options, suitable for passing on to the `filter` method of
another `FuseArgs` instance.
"""
import os, re
pr, pw = os.pipe()
pid = os.fork()
if pid == 0:
os.dup2(pw, 2)
os.close(pr)
fh = cls()
fh.fuse_args = FuseArgs()
fh.fuse_args.setmod('showhelp')
fh.main()
sys.exit()
os.close(pw)
fa = FuseArgs()
ore = re.compile("-o\s+([\w\[\]]+(?:=\w+)?)")
fpr = os.fdopen(pr)
for l in fpr:
m = ore.search(l)
if m:
o = m.groups()[0]
oa = [o]
# try to catch two-in-one options (like "[no]foo")
opa = o.split("[")
if len(opa) == 2:
o1, ox = opa
oxpa = ox.split("]")
if len(oxpa) == 2:
oo, o2 = oxpa
oa = [o1 + o2, o1 + oo + o2]
for o in oa:
fa.add(o)
fpr.close()
return fa
fuseoptref = classmethod(fuseoptref)
class Methproxy(object):
def __init__(self):
class mpx(object):
def __init__(self, name):
self.name = name
def __call__(self, *a, **kw):
return getattr(a[-1], self.name)(*(a[1:-1]), **kw)
self.proxyclass = mpx
self.mdic = {}
self.file_class = None
self.dir_class = None
def __call__(self, meth):
return meth in self.mdic and self.mdic[meth] or None
def _add_class_type(cls, type, inits, proxied):
def setter(self, xcls):
setattr(self, type + '_class', xcls)
for m in inits:
self.mdic[m] = xcls
for m in proxied:
if hasattr(xcls, m):
self.mdic[m] = self.proxyclass(m)
setattr(cls, 'set_' + type + '_class', setter)
_add_class_type = classmethod(_add_class_type)
Methproxy._add_class_type('file', ('open', 'create'),
('read', 'write', 'fsync', 'release', 'flush',
'fgetattr', 'ftruncate', 'lock'))
Methproxy._add_class_type('dir', ('opendir',),
('readdir', 'fsyncdir', 'releasedir'))
def __getattr__(self, meth):
m = self.methproxy(meth)
if m:
return m
raise AttributeError, "Fuse instance has no attribute '%s'" % meth
##########
###
### Compat stuff.
###
##########
def __init_0_1__(self, *args, **kw):
self.flags = 0
multithreaded = 0
# default attributes
if args == ():
# there is a self.optlist.append() later on, make sure it won't
# bomb out.
self.optlist = []
else:
self.optlist = args
self.optdict = kw
if len(self.optlist) == 1:
self.mountpoint = self.optlist[0]
else:
self.mountpoint = None
# grab command-line arguments, if any.
# Those will override whatever parameters
# were passed to __init__ directly.
argv = sys.argv
argc = len(argv)
if argc > 1:
# we've been given the mountpoint
self.mountpoint = argv[1]
if argc > 2:
# we've received mount args
optstr = argv[2]
opts = optstr.split(",")
for o in opts:
try:
k, v = o.split("=", 1)
self.optdict[k] = v
except:
self.optlist.append(o)
def main_0_1_preamble(self):
cfargs = FuseArgs()
cfargs.mountpoint = self.mountpoint
if hasattr(self, 'debug'):
cfargs.add('debug')
if hasattr(self, 'allow_other'):
cfargs.add('allow_other')
if hasattr(self, 'kernel_cache'):
cfargs.add('kernel_cache')
return cfargs.assemble()
def getattr_compat_0_1(self, *a):
from os import stat_result
return stat_result(self.getattr(*a))
def statfs_compat_0_1(self, *a):
oout = self.statfs(*a)
lo = len(oout)
svf = StatVfs()
svf.f_bsize = oout[0] # 0
svf.f_frsize = oout[lo >= 8 and 7 or 0] # 1
svf.f_blocks = oout[1] # 2
svf.f_bfree = oout[2] # 3
svf.f_bavail = oout[3] # 4
svf.f_files = oout[4] # 5
svf.f_ffree = oout[5] # 6
svf.f_favail = lo >= 9 and oout[8] or 0 # 7
svf.f_flag = lo >= 10 and oout[9] or 0 # 8
svf.f_namemax = oout[6] # 9
return svf
def readdir_compat_0_1(self, path, offset, *fh):
for name, type in self.getdir(path):
de = Direntry(name)
de.type = type
yield de
compatmap = {'readdir': 'getdir'}

View file

@ -0,0 +1,47 @@
#!/usr/bin/python
"""
The function in this Python module determines the current memory usage of the
current process by reading the VmSize value from /proc/$pid/status. It's based
on the following entry in the Python cookbook:
http://code.activestate.com/recipes/286222/
"""
import os
_units = { 'KB': 1024, 'MB': 1024**2, 'GB': 1024**3 }
_handle = _handle = open('/proc/%d/status' % os.getpid())
def get_memory_usage():
global _proc_status, _units, _handle
try:
for line in _handle:
if line.startswith('VmSize:'):
label, count, unit = line.split()
return int(count) * _units[unit.upper()]
except:
return 0
finally:
_handle.seek(0)
if __name__ == '__main__':
from my_formats import format_size
megabyte = 1024**2
counter = megabyte
limit = megabyte * 50
memory = []
old_memory_usage = get_memory_usage()
assert old_memory_usage > 0
while counter < limit:
memory.append('a' * counter)
msg = "I've just allocated %s and get_memory_usage() returns %s (%s more, deviation is %s)"
new_memory_usage = get_memory_usage()
difference = new_memory_usage - old_memory_usage
deviation = max(difference, counter) - min(difference, counter)
assert deviation < 1024*100
print msg % (format_size(counter), format_size(new_memory_usage), format_size(difference), format_size(deviation))
old_memory_usage = new_memory_usage
counter += megabyte
print "Stopped allocating new strings at %s" % format_size(limit)
# vim: ts=2 sw=2 et

8
dedupfs/lzo/README Normal file
View file

@ -0,0 +1,8 @@
This is a simple binding to the canonical LZO compression library. To compile
first make sure you have both liblzo2 and the headers installed (on Ubuntu you
can install the packages `liblzo2' and `liblzo2-dev'), then run the command
`python setup.py build && python setup.py install'.
Please note that this is my first Python/C interfacing code so be careful :-)
- Peter Odding <peter@peterodding.com>

149
dedupfs/lzo/lzomodule.c Normal file
View file

@ -0,0 +1,149 @@
#define BLOCK_SIZE (1024 * 128)
#include <python2.6/Python.h>
#include <lzo/lzo1x.h>
#include <stdio.h>
/* The following formula gives the worst possible compressed size. */
#define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3)
/* The size of and pointer to the shared buffer. */
static int block_size, buffer_size = 0;
static unsigned char *shared_buffer = NULL;
/* Don't store the size of compressed blocks in headers and trust the user to
* configure the correct block size? */
static int omit_headers = 0;
/* Working memory required by LZO library, allocated on first use. */
static char *working_memory = NULL;
#define ADD_SIZE(p) (omit_headers ? (p) : ((p) + sizeof(int)))
#define SUB_SIZE(p) (omit_headers ? (p) : ((p) - sizeof(int)))
static unsigned char *
get_buffer(int length)
{
if (omit_headers) {
if (!shared_buffer)
shared_buffer = malloc(buffer_size);
} else if (!shared_buffer || buffer_size < length) {
free(shared_buffer);
shared_buffer = malloc(length);
buffer_size = length;
}
return shared_buffer;
}
static PyObject *
set_block_size(PyObject *self, PyObject *args)
{
int new_block_size;
if (PyArg_ParseTuple(args, "i", &new_block_size)) {
if (shared_buffer)
free(shared_buffer);
block_size = new_block_size;
buffer_size = lzo1x_worst_compress(block_size);
shared_buffer = malloc(buffer_size);
omit_headers = 1;
}
Py_INCREF(Py_True);
return Py_True;
}
static PyObject *
lzo_compress(PyObject *self, PyObject *args)
{
const unsigned char *input;
unsigned char *output;
unsigned int inlen, status;
lzo_uint outlen;
/* Get the uncompressed string and its length. */
if (!PyArg_ParseTuple(args, "s#", &input, &inlen))
return NULL;
/* Make sure never to touch unallocated memory. */
if (omit_headers && inlen > block_size)
return PyErr_Format(PyExc_ValueError, "The given input of %i bytes is larger than the configured block size of %i bytes!", block_size, inlen);
/* Allocate the working memory on first use? */
if (!working_memory && !(working_memory = malloc(LZO1X_999_MEM_COMPRESS)))
return PyErr_NoMemory();
/* Allocate the output buffer. */
outlen = lzo1x_worst_compress(inlen);
output = get_buffer(ADD_SIZE(outlen));
if (!output)
return PyErr_NoMemory();
/* Store the input size in the header of the compressed block? */
if (!omit_headers)
*((int*)output) = inlen;
/* Compress the input string. The default LZO compression function is
* lzo1x_1_compress(). There's also variants like lzo1x_1_15_compress() which
* is faster and lzo1x_999_compress() which achieves higher compression. */
status = lzo1x_1_15_compress(input, inlen, ADD_SIZE(output), &outlen, working_memory);
if (status != LZO_E_OK)
return PyErr_Format(PyExc_Exception, "lzo_compress() failed with error code %i!", status);
/* Return the compressed string. */
return Py_BuildValue("s#", output, ADD_SIZE(outlen));
}
static PyObject *
lzo_decompress(PyObject *self, PyObject *args)
{
const unsigned char *input;
unsigned char *output;
int inlen, outlen_expected = 0, status;
lzo_uint outlen_actual;
/* Get the compressed string and its length. */
if (!PyArg_ParseTuple(args, "s#", &input, &inlen))
return NULL;
/* Get the length of the uncompressed string? */
if (!omit_headers)
outlen_expected = *((int*)input);
/* Allocate the output buffer. */
output = get_buffer(outlen_expected);
if (!output)
return PyErr_NoMemory();
/* Decompress the compressed string. */
status = lzo1x_decompress(ADD_SIZE(input), SUB_SIZE(inlen), output, &outlen_actual, NULL);
if (status != LZO_E_OK)
return PyErr_Format(PyExc_Exception, "lzo_decompress() failed with error code %i!", status);
/* Verify the length of the uncompressed data? */
if (!omit_headers && outlen_expected != outlen_actual)
return PyErr_Format(PyExc_Exception, "The expected length (%i) doesn't match the actual uncompressed length (%i)!", outlen_expected, (int)outlen_actual);
/* Return the decompressed string. */
return Py_BuildValue("s#", output, (int)outlen_actual);
}
static PyMethodDef functions[] = {
{ "compress", lzo_compress, METH_VARARGS, "Compress a string using the LZO algorithm." },
{ "decompress", lzo_decompress, METH_VARARGS, "Decompress a string that was previously compressed using the compress() function of this same module." },
{ "set_block_size", set_block_size, METH_VARARGS, "Set the max. length of the strings you will be compressing and/or decompressing so that the LZO module can allocate a single buffer shared by all lzo.compress() and lzo.decompress() calls." },
{ NULL, NULL, 0, NULL }
};
PyMODINIT_FUNC
initlzo(void)
{
int status;
if ((status = lzo_init()) != LZO_E_OK)
PyErr_Format(PyExc_Exception, "Failed to initialize the LZO library! (lzo_init() failed with error code %i)", status);
else if (!Py_InitModule("lzo", functions))
PyErr_Format(PyExc_Exception, "Failed to register module functions!");
}
/* vim: set ts=2 sw=2 et : */

4
dedupfs/lzo/setup.py Normal file
View file

@ -0,0 +1,4 @@
from distutils.core import setup, Extension
setup(name = "LZO", version = "1.0",
ext_modules = [Extension("lzo", ["lzomodule.c"], libraries=['lzo2'])])

38
dedupfs/my_formats.py Normal file
View file

@ -0,0 +1,38 @@
from math import floor
def format_timespan(seconds): # {{{1
"""
Format a timespan in seconds as a human-readable string.
"""
result = []
units = [('day', 60 * 60 * 24), ('hour', 60 * 60), ('minute', 60), ('second', 1)]
for name, size in units:
if seconds >= size:
count = seconds / size
seconds %= size
result.append('%i %s%s' % (count, name, floor(count) != 1 and 's' or ''))
if result == []:
return 'less than a second'
if len(result) == 1:
return result[0]
else:
return ', '.join(result[:-1]) + ' and ' + result[-1]
def format_size(nbytes):
"""
Format a byte count as a human-readable file size.
"""
return nbytes < 1024 and '%i bytes' % nbytes \
or nbytes < (1024 ** 2) and __round(nbytes, 1024, 'KB') \
or nbytes < (1024 ** 3) and __round(nbytes, 1024 ** 2, 'MB') \
or nbytes < (1024 ** 4) and __round(nbytes, 1024 ** 3, 'GB') \
or __round(nbytes, 1024 ** 4, 'TB')
def __round(nbytes, divisor, suffix):
nbytes = float(nbytes) / divisor
if floor(nbytes) == nbytes:
return str(int(nbytes)) + ' ' + suffix
else:
return '%.2f %s' % (nbytes, suffix)
# vim: sw=2 sw=2 et

242
dedupfs/tests.sh Normal file
View file

@ -0,0 +1,242 @@
#!/bin/bash
TIMESTAMP="`date +%s`"
ROOTDIR="/tmp/dedupfs-tests-$TIMESTAMP"
MOUNTPOINT="$ROOTDIR/mountpoint"
METASTORE="$ROOTDIR/metastore.sqlite3"
DATASTORE="$ROOTDIR/datastore.db"
WAITTIME=1
TESTNO=1
# Initialization. {{{1
FAIL () {
FAIL_INTERNAL "$@"
exit 1
}
MESSAGE () {
tput bold
echo "$@" >&2
tput sgr0
}
FAIL_INTERNAL () {
echo -ne '\033[31m' >&2
MESSAGE "$@"
echo -ne '\033[0m' >&2
}
CLEANUP () {
DO_UNMOUNT
if ! rm -R "$ROOTDIR"; then
FAIL_INTERNAL "$0:$LINENO: Failed to delete temporary directory!"
fi
}
# Create the root and mount directories.
mkdir -p "$MOUNTPOINT"
if [ ! -d "$MOUNTPOINT" ]; then
FAIL "$0:$LINENO: Failed to create mount directory $MOUNTPOINT!"
exit 1
fi
DO_MOUNT () {
# Mount the file system using the two temporary databases.
python dedupfs.py -fv "$@" "--metastore=$METASTORE" "--datastore=$DATASTORE" "$MOUNTPOINT" &
# Wait a while before accessing the mount point, to
# make sure the file system has been fully initialized.
while true; do
sleep $WAITTIME
if mount | grep -q "$MOUNTPOINT"; then break; fi
done
}
DO_UNMOUNT () {
if mount | grep -q "$MOUNTPOINT"; then
sleep $WAITTIME
if ! fusermount -u "$MOUNTPOINT"; then
FAIL_INTERNAL "$0:$LINENO: Failed to unmount the mount point?!"
fi
while true; do
sleep $WAITTIME
if ! mount | grep -q "$MOUNTPOINT"; then break; fi
done
fi
}
DO_MOUNT --verify-writes --compress=lzo
# Tests 1-8: Test hard link counts with mkdir(), rmdir() and rename(). {{{1
CHECK_NLINK () {
NLINK=`ls -ld "$1" | awk '{print $2}'`
[ $NLINK -eq $2 ] || FAIL "$0:$3: Expected link count of $1 to be $2, got $NLINK!"
}
FEEDBACK () {
MESSAGE "Running test $1"
}
# Test 1: Check link count of file system root. {{{2
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
# Test 2: Check link count of newly created file. {{{2
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
FILE="$MOUNTPOINT/file_nlink_test"
touch "$FILE"
CHECK_NLINK "$FILE" 1 $LINENO
CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
# Test 3: Check link count of hard link to existing file. {{{2
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
LINK="$MOUNTPOINT/link_to_file"
link "$FILE" "$LINK"
CHECK_NLINK "$FILE" 2 $LINENO
CHECK_NLINK "$LINK" 2 $LINENO
CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
unlink "$LINK"
CHECK_NLINK "$FILE" 2 $LINENO
CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
# Test 4: Check link count of newly created subdirectory. {{{2
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
SUBDIR="$MOUNTPOINT/dir1"
mkdir "$SUBDIR"
if [ ! -d "$SUBDIR" ]; then
FAIL "$0:$LINENO: Failed to create subdirectory $SUBDIR!"
fi
CHECK_NLINK "$SUBDIR" 2 $LINENO
# Test 5: Check that nlink of root is incremented by one (because of subdirectory created above). {{{2
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
CHECK_NLINK "$MOUNTPOINT" 3 $LINENO
# Test 6: Check that non-empty directories cannot be removed with rmdir(). {{{2
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
SUBFILE="$SUBDIR/file"
touch "$SUBFILE"
if rmdir "$SUBDIR" 2>/dev/null; then
FAIL "$0:$LINENO: rmdir() didn't fail when deleting a non-empty directory!"
elif ! rm -R "$SUBDIR"; then
FAIL "$0:$LINENO: Failed to recursively delete directory?!"
fi
# Test 7: Check that link count of root is decremented by one (because of subdirectory deleted above). {{{2
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
CHECK_NLINK "$MOUNTPOINT" 2 $LINENO
# Test 8: Check that link counts are updated when directories are renamed. {{{2
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
ORIGDIR="$MOUNTPOINT/original-directory"
REPLDIR="$MOUNTPOINT/replacement-directory"
mkdir -p "$ORIGDIR/subdir" "$REPLDIR/subdir"
for DIRNAME in "$ORIGDIR" "$REPLDIR"; do CHECK_NLINK "$DIRNAME" 3 $LINENO; done
mv -T "$ORIGDIR/subdir" "$REPLDIR/subdir"
CHECK_NLINK "$ORIGDIR" 2 $LINENO
CHECK_NLINK "$REPLDIR" 3 $LINENO
# Tests 9-14: Write random binary data to file system and verify that it reads back unchanged. {{{1
TESTDATA="$ROOTDIR/testdata"
WRITE_TESTNO=0
while [ $WRITE_TESTNO -le 5 ]; do
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
NBYTES=$[$RANDOM % (1024 * 257)]
head -c $NBYTES /dev/urandom > "$TESTDATA"
WRITE_FILE="$MOUNTPOINT/$RANDOM"
cp -a "$TESTDATA" "$WRITE_FILE"
sleep $WAITTIME
if ! cmp -s "$TESTDATA" "$WRITE_FILE"; then
(sleep 1
echo "Differences:"
ls -l "$TESTDATA" "$WRITE_FILE"
cmp -lb "$TESTDATA" "$WRITE_FILE") &
FAIL "$0:$LINENO: Failed to verify $WRITE_FILE of $NBYTES bytes!"
fi
WRITE_TESTNO=$[$WRITE_TESTNO + 1]
done
# Test 15: Verify that written data persists when remounted. {{{1
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
DO_UNMOUNT
DO_MOUNT --nogc # <- important for the following tests.
if ! cmp -s "$TESTDATA" "$WRITE_FILE"; then
(sleep 1
echo "Differences:"
ls -l "$TESTDATA" "$WRITE_FILE"
cmp -lb "$TESTDATA" "$WRITE_FILE") &
FAIL "$0:$LINENO: Failed to verify $WRITE_FILE of $NBYTES bytes!"
fi
# Test 16: Verify that garbage collection of unused data blocks works. {{{1
FEEDBACK $TESTNO
TESTNO=$[$TESTNO + 1]
DO_UNMOUNT
FULL_SIZE=`ls -l "$DATASTORE" | awk '{print $5}'`
HALF_SIZE=$[$FULL_SIZE / 2]
DO_MOUNT
rm $MOUNTPOINT/* 2>/dev/null
DO_UNMOUNT
REDUCED_SIZE=`ls -l "$DATASTORE" | awk '{print $5}'`
[ $REDUCED_SIZE -lt $HALF_SIZE ] || FAIL "$0:$LINENO: Failed to verify effectiveness of data block garbage collection! (Full size of data store: $FULL_SIZE, reduced size: $REDUCED_SIZE)"
# Test 17: Verify that garbage collection of interned path segments works. {{{1
DO_MOUNT --nosync
SEGMENTGCDIR="$MOUNTPOINT/gc-of-segments-test"
mkdir "$SEGMENTGCDIR"
for ((i=0;i<=512;i+=1)); do
echo -ne "\rCreating segment $i"
touch "$SEGMENTGCDIR/$i"
done
echo -ne "\rSyncing to disk using unmount"
DO_UNMOUNT
FULL_SIZE=`ls -l "$METASTORE" | awk '{print $5}'`
HALF_SIZE=$[$FULL_SIZE / 2]
echo -ne "\rDeleting segments"
DO_MOUNT --nosync
rm -R "$SEGMENTGCDIR"
echo -ne "\rSyncing to disk using unmount"
DO_UNMOUNT
REDUCED_SIZE=`ls -l "$METASTORE" | awk '{print $5}'`
[ $REDUCED_SIZE -lt $HALF_SIZE ] || FAIL "$0:$LINENO: Failed to verify effectiveness of interned string garbage collection! (Full size of metadata store: $FULL_SIZE, reduced size: $REDUCED_SIZE)"
echo -ne "\r"
# Finalization. {{{1
CLEANUP
MESSAGE "All tests passed!"
# vim: ts=2 sw=2 et