next up previous contents
Next: Implementation Up: Project Report Metadata file Previous: Problems   Contents

A simple design proposal

A metadata file system have a lot of stuff in common with a regular file system. In a regular file system the main problem is to design algorithm which can figurer out where on the disk to put a given file, to be able to find it again, and prevent it from being fragmented. All of those problems will also have to be solved in a metadata file system. In fact could a metadata file system be implemented on top on a regular file system(and it should be to save a lot of work, maybe it can even be implemented on top of VFS to allow an arbitrary storage backend).

So when designing a metadata file system, the main concern should not be how to do the allocation on the disk, as this work has already been done pretty well by a lot before you. Some of the concerns should instead be the problems which have been described in section 3.4.

As a proof of concepts I will describe the design of a simple yet powerful metadata file system which avoids the problems described in section 3.4, the design will contain most of the features described in section 2.1.2. The design is not detailed in any terms, but dose only describes the basics idea. I assume a regular file system as a starting point, and will not discuss file permissions, special files such as links, devices or named pipes, the file system is in no way suitable as a root file system.

Suppose a regular file system, which is modified to contain virtual files and folders12. The root of the file system should contain the following 3 folders:

Schemas
This folder should contains all the schemas which are imported into the file system. By coping in a new file, it should be treated as a schema, if the file system dose not recognize the file as a schema, it should just throw it away and write a message to the kernel log. If a file is deleted, the schema should be removed from the file system, and if it is read the user should get the documentation.
Hook-on-script
This folder should contain one folder for every schema which are added to the file system. If a file is copied into one of those folders, it should be treated as a hook-on script for the given schema. A hook-on script is simply an application which can be executed, can accept input on stdin, and can produce output on stdout. The output on stdout should be metadata for the schema, and should be extracted from the data on stdin by the script.
Import
folder should contain a folder for every schema which exits in the file system. Each of the folders should contain a folder named auto, a folder named quarantine and a folder named hook-on-script.
Auto
If a file is copied into the auto folder, the minimum of the metadata which are allowed by the schema should be added to the file. The metadata should be auto generated with valid values. After the file is copied in here it should not be visible in the folder, but using the query interface it should be able to find the file.
Quarantine
if a file is copied in here, it should be associated with the current schema, but no values should be filled in. It should be able to add, delete and edit metadata on the file. The file should stay in the folder, and it should not be visible through the query interface until, it have been committed, after it have been committed it should not exist in the quarantine area any more.
hook-on-scripts
folder should contain a folder for every hook-on-script which are associated with the schema, if a file is copied into a hook-on-folder, the given hook-on-script should be used, to extract the metadata. The file should instant be able through the query interface, and should not be visible in the hook-on-folder.
User
the user folder is the folder the user in general should use. The folder should contain a single virtual file named q. The following file operations should be able on the file q:
write or append
every thing which are written to q is treated as a query. After a query have been written to q, the result would be visible in the folder q exits in.
unlink (or delete)
If the file q is deleted it should still exits, but the query should be cleared, and the folder should be empty (but virtual files and folders will still exits)
read
by reading q the current query should be read.
The regular files which exits in the folder because of a query, should be treated as file is treated in any other file systems. This means they can be deleted, appended to, read and so on. If a new file is added all the information which can be extracted from the search query should be stated in the files metadata. If a new folder is added it should be treated as a virtual folder, and should contain a virtual file q, and works in the same way as the current folder.
Processes
the processes folder should contain a folder for every process, the folder should be named by the processes id, and only the processes with this particular process id should have access to the folder. In any other ways it should works exactly as the user folder.

It would be possible to implement this, with out breaking backwards compatibility with existing applications.


next up previous contents
Next: Implementation Up: Project Report Metadata file Previous: Problems   Contents
2007-11-09