Next: Implementation
Up: Project Report Metadata file
Previous: Problems
Contents
A metadata file system have a lot of stuff in common with a regular
file system. In a regular file system the main problem is to design
algorithm which can figurer out where on the disk to put a given file,
to be able to find it again, and prevent it from being fragmented. All
of those problems will also have to be solved in a metadata
file system. In fact could a metadata file system be implemented on top
on a regular file system(and it should be to save a lot of work, maybe it can even
be implemented on top of VFS to allow an arbitrary storage backend).
So when designing a metadata file system, the main concern should not
be how to do the allocation on the disk, as this work has already
been done pretty well by a lot before you. Some of the concerns should
instead be the problems which have been described in section
3.4.
As a proof of concepts I will describe the design of a simple yet
powerful metadata file system which avoids the problems described in
section 3.4, the design will contain most of the features
described in section 2.1.2. The design is not detailed in any terms, but
dose only describes the basics idea. I assume a regular file system as a
starting point, and will not discuss file permissions, special files
such as links, devices or named pipes, the file system is in no way
suitable as a root file system.
Suppose a regular file system, which is modified to contain virtual
files and folders12. The root of the file system should contain the
following 3 folders:
- Schemas
- This folder should contains all the schemas which are
imported into the file system. By coping in a new file, it should
be treated as a schema, if the file system dose not recognize the
file as a schema, it should just throw it away and write a message
to the kernel log. If a file is deleted, the schema should be
removed from the file system, and if it is read the user should get
the documentation.
- Hook-on-script
- This folder should contain one folder for every
schema which are added to the file system. If a file is copied into
one of those folders, it should be treated as a hook-on script for
the given schema. A hook-on script is simply an application which
can be executed, can accept input on stdin, and can
produce output on stdout. The output on
stdout should be metadata for the schema, and should be
extracted from the data on stdin by the script.
- Import
- folder should contain a folder for every schema which
exits in the file system. Each of the folders should contain a
folder named auto, a folder named quarantine and a folder named
hook-on-script.
- Auto
- If a file is copied into the auto folder, the
minimum of the metadata which are allowed by the schema should
be added to the file. The metadata should be auto generated
with valid values. After the file is copied in here it should
not be visible in the folder, but using the query interface it
should be able to find the file.
- Quarantine
- if a file is copied in here, it should be
associated with the current schema, but no values should be
filled in. It should be able to add, delete and edit metadata
on the file. The file should stay in the folder, and it should
not be visible through the query interface until, it have been
committed, after it have been committed it should not
exist in the quarantine area any more.
- hook-on-scripts
- folder should contain a folder for every
hook-on-script which are associated with the schema, if a file
is copied into a hook-on-folder, the given hook-on-script
should be used, to extract the metadata. The file should
instant be able through the query interface, and should not be
visible in the hook-on-folder.
- User
- the user folder is the folder the user in general should
use. The folder should contain a single virtual file named q. The
following file operations should be able on the file q:
- write or append
- every thing which are written to q is
treated as a query. After a query have been written to q, the
result would be visible in the folder q exits in.
- unlink (or delete)
- If the file q is deleted it should
still exits, but the query should be cleared, and the folder
should be empty (but virtual files and folders will still exits)
- read
- by reading q the current query should be read.
The regular files which exits in the folder because of a query,
should be treated as file is treated in any other file systems.
This means they can be deleted, appended to, read and so on. If a
new file is added all the information which can be extracted from
the search query should be stated in the files metadata. If a new
folder is added it should be treated as a virtual folder, and
should contain a virtual file q, and works in the same way as the
current folder.
- Processes
- the processes folder should contain a folder for
every process, the folder should be named by the processes id, and
only the processes with this particular process id should have access to
the folder. In any other ways it should works exactly as the user
folder.
It would be possible to implement this, with out breaking backwards
compatibility with existing applications.
Next: Implementation
Up: Project Report Metadata file
Previous: Problems
Contents
2007-11-09