Changing libgnomecups For Multiple Evolution Users
Happy National Sys Admin Appreciation Day!
ESX iSCSI Basic Configuration from the CLI
Tape Rants and Raves: LTO4 Rules
apparently you aren't dead until you start to stink
Charlie Goes to Candy Mountain
Seattle Scalability Conference, Pt II
Announcing the Hyperic VMware Appliance
Cygnal - When Red5 just won't cut it for an RTMP server
IBM's CoScripter - automating web-based processes
AjaxWindows.com - Another Michael Robertson company
p0f passive fingerprinting IDS
Talking storage systems with Sun's ZFS team
SproutCore - a MVC scaffolding for actual Application development
Skype protocol obfuscation layer
Microsoft Silverlight and the Mono team at Novell join up to create the Moonlight project
Bitlbee - bridge IM client networks to an IRC channel.
EJBCA - The J2EE Certificate Authority
Mcell 3.5" drive has 1GB of DDR RAM 2.5" drive == 110MB/s transfer rates
OpenSolaris Xen domU with a linux dom0
Tentakel: distributd command execution
Ganeti: Opensource virtual server management software for Xen
Seemless dynamic image resizing
Mono and XPCOM scripting VirtualBox
podbrix young woz and jobs playset
Woz gets a speeding ticket for 104mph in a Prius
Google Starts Shared Storage Service
Storm Worm DDoSes scanning machines
Defendant wins access to the Intoxilyzer 5000EN Breathalyzer source code
How to replace graffiti 2 with the original graffiti on a Palm
customizegoogle.com - a firefox plugin for customizing google
iSER - iSCSI Extensions for RDMA
RDMA - Remote Direct Memory Access
HP $16billion acquisition story
Gentoo Foundation disappearing?
Novell Xen HVM PV Driver Pack for Windows
Font rasterization, and why Microsoft's methods are broken.
scanr - turn your camera phone into a scanner, copier, and fax.
MPX - the multi-point x server project site
MPX - Multi-Point Touch extension for X.org
RubyOS/X - one click Mac ruby deployment.
Xen guest implemented as a paravirt_ops backend
Checkpoint VPN client causes airport to spin out of control in Mac OS/X 10.4.8
ext3 checksums == Internal RObustNess (IRON) file system
Joey@Mozilla Labs == Mobile synchronized content
smugmug.com == 300TB of S3 storage, at $0.15/G/month, isn't that $45,000 a month?
ELC tabTerm for Mac OS/X, sadly discontinued as leopard will have tabbed terminals
Microsoft iSCSI boot with MPIO
Boot Win2k3 from the Microsoft iSCSI initiator.
vmgbd - vmware generic block device patch
vmware-bdwrapper - block device ioctl wrapper for VMWare
Google faces brain drain as anniversaries hit
Xen on Centos host running FreeBSD guest
Quercus - pure java implementation of php 5
How many engineers does it take to turn on a light bulb?
Scalable Distributed Data Structures - P2P based storage
When Sysadmins Ruled the Earth
Dynamic object binding (ld.so) - a pretty darned good blog post
HOWTO: Debugging a remote Windows HVM under Xen using WinDBG
Intuit Enterprise Solutions embraces Linux
VMWare Fusion 1.0b4 with Unity now available
RedHat to release Xen PV drivers with RHEL 5.1
Akamai real-time Internet weather map
Parallel Coherence, meet VMWare Unity.
TBO.com looking for a Django/Python developer in Tampa.
Tracking Assets in the Production of 'Final Fantasy : The Spirits Within'
10 signs we are in a tech bubble
Funny story behind the latest AACS key crack
Alternative Fuels lead to Tequila shortage
Trickle - a lightweight userspace LD_PRELOAD bandwidth shaper
dwm - dynamic window manager in under 2k lines of C
xmonad - a tiling window manager written in haskell
Xen-users post about Xen network performance
OSCircular - boot anonymous OSes from the Internet under Xen via HTTP
Frysk - system monitoring and debugging tool
OpenAIS - Standards-bsaed Cluster Framework
Conga - manage RHCS cluster services for Xen VMs
KeyJnote - a python based opengl presentation system
SystemTap - a DTrace workalike for Linux
EE Times Under the Hood - inside the Prius
Embedding flash while retaining standards.
ATLAS RepDB - spread based postgres repl
Migraines may be causing brain damage
Sun mulls deeper opensource dive
6502asm.com - a 6502 assembler and emulator written in javascript.
The Alameda Weehawken Burrito Tunnel
Project Honeypot files $1billion John Doe lawsuit against spammers
All your base are belong to google.
Lisp is not an acceptable Lisp
High-Performance SSH/SCP patches
DNBD - distributed network block device
Lucene Hadoop (previously Nutch NDFS) on Amazon S3
Time Series and Stochastic Processes
Software Update for Web Folders - add Web Folders to your Windows 2003 server
Japanese subculture of hacking Prius
XenSource caters to Win2k laggards
DJB article on secure machine identification with a central PKI model
Linux 2.6 swap "files" are just fine
Installing OpenSUSE 10.1 in a domu with installer
AjaxTerm - another Ajax terminal
How to get sound working with a thin-client Linux setup (LTSP)
g4u - Harddisk Image Cloning for PCs
Using the nVidia binary druver under Xen on Debian Etch
German court finds Microsoft's FAT Patent null and void
Mayans must re-sanctify temple after Bush visit
IEs 4 Linux - a simple script to have all 3 versions of IE running under Linux
Google's Summer of Code 2007 - About to start
CruiseControl - a framework for a continuous build process
Suhosin - Hardened PHP project
Vista Activation Cracked by Brute Force
National ID Card Regulations Issued
Running Lucene Hadoop FS on Amazon EC2
Matt Dillon returns with plans for an HA Clustering Filesystem for DragonflyBSD
Comparison of opensource configuration management software
What happened to Wall Street today
lssconf - large scale system configuration
Standford Collective Group - virtual appliance computing infrastructure
Jumbo frame clean ethernet gear
Apparently Fraunhofer is the wrong company to license MP3 tech from.
NetDirector - opensource config management that I keep forgetting about and running into again.
Growing a brain in Switzerland
OpenQRM Technical Overview Whitepaper from 11/08/06
MPAA steals code, violates linkware license.
Twingly screensaver - neat RSS visualization
VMWare Virtual Desktop Infrastructure (VDI) Brokers
IBM System p5 560Q to run 320 virtualized x86 linux images...
Xandros management something.. full of marketing doublespeak
Ceph - another GlusterFS/GFarm/.. petabyte scale filesystem
nVidia CUDA - use your GPU for supercomputing.
NSpluginwrapper: use 32bit firefox plugins with 64bit firefox versions.
Intel Web 2.0 Technology Development Kit
EV SSL is ineffective in deterring users
Firefox3 will support offline webapps
Novell offers PV drivers for Xen.
One of the many reasons why Linux is more secure than Windows
A performance comparison between VMWare ESX and Xen Hypervisors
Solaris 10 and 11 with telnet/login enabled are remotely exploitable.
VMWare Fusion Public Beta available for Mac
MindMeister - another web2.0 mind mapping site
3d model shows big body of water in earth's mantle.
Merged Xen/OpenVZ kernel patches worked back in Aug' 2005. Any bets on now?
Ultra-slim Credit Card-Sized Bluetooth Keyboard
- open-management.com - Open Management Consortium
More on the 16qubit quantum computer.
Hyperic HQ - an up-and-coming OpenSource platform for centralized IT management
go2web20.net - a web2.0 directory
VMWare Fusion for Mac OS/X - virtual 3d gaming
IntelliJ IDEA features Ruby and Ruby on Rails support.
Deploying 3d thin-clients with beryl in Largo
Fabrice Bellard opensources kqemu
QEMU's kqemu internal workings, documented.
vi commands in all of your cocoa apps
The 2 minute extended version of the GoDaddy Superbowl Commercial
VPX Redline Putting People in the Hospital
Confidential Microsoft memo: "Lets steal Java"
Setup Xen within 20 minutes with grml
Vista dropped OpenGL - legacy games, benchmarks, and CAD programs severely impacted
vmdk2vhd - convert vmware's vmdk images to microsoft's vhd format
GT4 Key Concepts (A Globus Primer)
Microsoft's WPF/E plugin goes Firefox, Safari, etc.
MIT's latest 400 page report: The Future of Geothermal Energy
The Intergovernmental Panel on Climate Change Report
An introduction to lguest (aka lhype, aka RustyVizor)
Using the OpenSolaris Mercurial Repository
Boston Mooninite Fetching $5k on eBay
Storage Virtualization becoming a reality
Google Talk video on Debmarshal
Debmarshal - a Google code hosted project for building and maintaining your own Debian distributions
eJamming - collaboratively jam live with other musicians online
Michael Dell is CEO of Dell again. Welcome to Dell 2.0.
Technorati WTF launched - another Digg clone
John Udell on Calendar cross-publishing concepts
How to install Enomalism v0.6 on Ubuntu
Microsoft's Jim Gray is missing at sea.
Seagate D.A.V.E - a wifi/bluetooth 20G drive
Apple's $1.99 802.11n enabler for MacBooks for sale now.
xrdp - an opensource X11 RDP server
Hypervisor-Based Fault-Tolerance Whitepaper
Allfreecalls.net - make free calls through Iow
HOWTO - Make an iNoPhone (Apple Mouse Bluetooth headset)
Another Asterisk under Xenu HOWTO
wefeelfine.org = rss visualization emotion
Microsoft Patents stolen BlueJ
GoogleTV beta or silly shenanigans?
Krugle - code search for developers
Apple removes LUN masking from Xserve RAID
xsfs - xenstored FUSE filesystem
PlatForms 2007 kicks off - 9 different web devel platforms fight head to head.
Google maps Java app = Goggles flight sim
Java Posse Episode 100 - a Google Talk video
Xen Remote Management Interfaces
VMWare Infrastructure 3 demo video
swivel - a web 2.0 visualization site
manyeyes - IBM's beta visualization site
HSDPA - UMTS' answer to EVDO Rev A
ElasticLive - Enomaly's mashup of Amazon EC2 and Globus Virtual Workspaces.
After filling a CornFS volume for a couple of days now, I found a few problems that really begged for another release.
I'm still building cornfs with debug flags and under gdb to catch any segfaults in the new caching code. Sure enough, it found a segfault or two that I needed to cleanup my pointer handling a bit. Cacheinsert() now works for a rather huge cacheinventory() run without incident.
There was also a bug with the statfs() setup in the cache upload function. Instead of statfs()ing the /data/cornfs/import/{servername} directory, it was handily using /data/cornfs/import. All of the servers appeared to have the same remaining free space, which caused the last two servers to fill to the brim.
Like I said, there will likely be some rapid releases this week as I stumble upon more nits to pick.
For now you can download cornfs-v0.0.5.1.tar.bz2 and have at it.
I've been working on cornfs this weekend a bit to speed things up.
With the help of gprof and gcc -pg, I found that the caching routines were causing a huge performance hit. Every read() and write() was doing a linear linked list search through every cached entry. This is fixed.
Along the way, I found it difficult to debug things with one huge cornfs.c source file. So I've split that up into numerous .c source files to fix this.
I also updated the Makefile to build on its own without building under fuse/examples as before. It is now 2.5.2 friendly, and compiles with 22 ABI compatibility. I'll see about adding the 25 ABI functions shortly.
So, download cornfs-v0.0.5.0.tar.bz2, extract, and build with make.
A few folks have mentioned they were playing with cornfs via private email. With this latest version, and NFS over ssh, NKS is finally running this in a production environment.
Look forward to some rapid updates here in the near future.
I've had serious problems using shfs, sshfs, and sfs. The first two fall apart under load, and the latter is a nightmare to get working in our environment (a PAM nightmare, that is).
Rather than dealing with something crazy, I decided to go back to a faithful old standard: NFS. As the remote storage nodes are accessible only via ssh, ssh was the ideal transport for the NFS mounts.
How do you do this? With a little port trickery and some inittab craziness to hold the tunnels up.
NFS v3 and newer have a TCP transport mode that make it possible to tunnel using ssh. Older versions of NFS use a UDP based ONC RPC transport. Make sure you have kernel support for TCP and NFS v3 before you continue.
On the remote nodes, install NFS:
# apt-get install nfs-kernel-server nfs-common portmap
Then setup an exports file sharing something to localhost:
# echo "/exports localhost(rw,async,insecure,no_root_squash)" >> /etc/exports
We need to have mountd start on a known port to setup the ssh tunnel from the master. The "-p" flag is used for this. Debian keeps the RPCMOUNTDOPTS flags in /etc/default/nfs-kernel/server, easily updated with this perl one-liner:
# perl -pi -e 's/^(RPCMOUNTDOPTS)=.*$/$1="-p 32767"/' /etc/default/nfs-kernel-server
It's also a good idea to block portmap request from anything but localhost with tcpwrappers, just in case your firewall rules happen to be down for some reason.
# echo "portmap: LOCAL" >> /etc/hosts.allow
Now restart things and make sure the mountpoint is being exported:
# /etc/init.d/nfs-kernel-server stop
# /etc/init.d/nfs-common stop
# /etc/init.d/portmap stop
# /etc/init.d/portmap start
# /etc/init.d/nfs-common start
# /etc/init.d/nfs-kernel-server start
# rpcinfo -p localhost
# showmount -e localhost
The remote server is now ready to mount. Return to your central master cornfs server that will act as the client and setup an ssh tunnel.
Step 1: Install nfs-client
# apt-get install nfs-client
Step 2: Setup key trust with the remote server:
# ssh-keygen -f ~/.ssh/id_dsa-cornfs -P'' -t dsa -b 1024
# cat ~/.ssh/id_dsa-cornfs.pub | ssh remoteserver 'mkdir ~/.ssh; cat - >> ~/.ssh/authorized_keys'
Step 3: Setup the SSH tunnel with an inittab respawn
# echo 'N0:23:respawn:/usr/bin/ssh -c blowfish -L 10000:localhost:2049 -L 11000:localhost:32767 remoteserver vmstat 300' >> /etc/inittab
# telinit q
Now you should see an ssh tunnel running in a process listing. Check your system logs to see if there are any problems.
Step 4: Add fstab entries for NFS:
# echo 'localhost:/export /data/cornfs/import/remoteserver nfs rw,bg,soft,port=10000,mountport=11000,tcp 0 0' >> /etc/fstab
# mount /data/cornfs/import/remoteserver
You should now see your remote server /export filesystem mounted under /data/cornfs/import.
Each remote server will need to have a unique nfs and mountd port assignment. Repeat steps 3 and 4 for each.
I started at 10000 and 11000 and worked my way up from there. The next server's port assignments are 10001 and 11001, etc.
This works suprisingly well, and appears to be quite stable (far more stable than the other alternatives).
That's not to say things are as fast as they could possibly be, but it works.
Latest version: v0.0.5.0
The braindump for CORNFS explains many things about this project.
CORNFS is an attempt at creating a distributed filesystem that mirrors N copies of files across a group of M number of servers. Everything in CORNFS is stored as a file.
At any time, it is possible to reconstruct the entire filesystem via a simple overlay rsync from the remote filesystems - there is no "special database" to worry about.
Rather than mirroring at the volume or block level, CORNFS mirrors at the file level, tracking what servers a file is mirrored on. CORNFS works with locally cached copies of files and a central metadata state directory.
Extended attributes are used to mark metadata state files with information CORNFS uses to track the mirrors for a particular file, as well as cached files that are marked as "dirty" (for copying back to remote servers when a cached file is modified).
As files are written, the servers with the most available disk space are used for new files (braindead simple algorithm for the moment). When a cached file is modified, the file is copied back to its mirrors (or new mirrors should a server be unavailable). CORNFS keeps metadata centrally to keep a sane filesystem state. Every remote server's metadata state is known by the central server. The central server's metadata state is authorative; while remote servers may go offline, when they come back online, any files that were updated while they were unavailable will have been removed from that server in the central metadata and will not be referred to (such "orphaned files" will need to be pruned periodically).
As a last resort, the master's cached copy is authorative. If mirrors cannot be written to, the cache file will remain dirty, and will not be expired.
This is a production running release, as used by my employer today.
The history of development so far:
The first (broken) release.
A number of fixes make this version _usable_. There are most definitely corner cases
that have not been dealt with yet, though it seems to suffer an rsync/rm well now.
Adds partial read()s while the copy is underway during an open() (until I figure out
how to spawn a pthread() for the copy, this does not really do much yet).
Added pthread_mutex_lock(&corn_copy_lock) to copy_file. Added corn_magic and
USE_MAGIC wrappers for magic file identification.
The dynamic expiring cache code is now present. Added cache_inventory(),
cache_insert(), cache_update(), cache_expire_to_limit(), cache_rename(),
and cache_remove().
Added S_ISREG check to read() and write(). Any non-regular file read/write calls
are now mapped correctly to state files. Also fixed cache_insert.
Turned off debugging, removed hardcoded size limit.
Remove stat()ing of cached files, replace with cache_exists(), particularly in read()
and write(). Move as many dirty checks to corn_cache as possible. cache_mark_dirty(),
cache_mark_clean(), cache_is_clean(). Fixed some more mallocs.
Add dirty check to cache_expire
(do not expire something from cache if it does not have a good mirror!!!)
Add copy_file_thread, copy_file_wait, and copy_file_nowait.
Copying is now threadable!
Add fsck_thread and xmp_init/xmp_destroy. The fsck_handler_*
functions are no done yet, but are ready to fill in.
Relabeled all xmp_ functions to cornfs_;
Reworked open() function quite a bit: moved much
of the copy to cache logic to download_to_cache();
Defined corn_file_info struct, used to pass open()
file descriptor to read() and write();
Moved code from release() to upload_from_cache(),
aadded to fsck_cache_handler()
Filled in fsck_meta_handler() and fsck_state_handler()
Fixed some logic errors in fsck_import_handler()
Filesystem appears to fsck correctly now.
Add control_file_read()/write() and corn_file_info structure updates to handle control file IO.
Profiled code, found cache_update()/cache_insert() biggest culprit
Made corn_cache a two-way linked list to remedy above.
Split up cornfs.c into numerous .c source files to simplify coding
Now in a tarball because of above.
Or grab the latest cornfs.tar.bz2 with everything you need.
Things to fix:
The easiest way to build this is to grab fuse-2.5.2.tar.gz and extract it:
$ tar xvzf fuse-2.5.2.tar.gz
Then extract the cornfs.tar.bz2 somewhere and build it:
$ cd /tmp
$ wget http://ian.blenke.com/projects/cornfs/cornfs.tar.bz2
$ cd /tmp ; tar xvjf cornfs.tar.bz2
$ make -C /tmp/cornfs
You should now have a "cornfs" runtime. If not, drop me an email.
The directory tree to make this usable is hardcoded at the moment into the runtime (constants toward the top of the source file).
$ mkdir -p /data/cornfs/cfgs/servers
$ echo /remote/path > /data/cornfs/cfgs/servers/SERVERNAME
$ mkdir -p /data/cornfs/metadata/state
$ mkdir -p /data/cornfs/metadata/cache
$ mkdir -p /data/cornfs/metadata/SERVERNAME
$ mkdir -p /data/cornfs/import/SERVERNAME
The only missing bits are mounting the import/SERVERNAME directories for each filesystem configured in cfgs/servers/. You can use SHFS, NFS, DAVFS2, or whatever the heck your linux kernel has support for. The CORNFS strives to be filesystem agnostic.
$ cd /data/cornfs/cfgs/servers
$ for server in * ; do mkdir -p /data/cornfs/import/$server ; shfsmount $server:`cat $server` /data/cornfs/import/$server ; done
Now start the cornfs server with a reference path:
$ cd /tmp/cornfs
$ mkdir /mnt/cornfs
$ ./cornfs /mnt/cornfs -d
The "-d" flag adds FUSE debugging.
The lower you set the DEBUG level when building cornfs, the more debugging info will appear. It's an enum, so that can easily be reversed (Verbosity).
By default, the DEBUG level isn't set at all. In that mode, all debugging is macroed away to oblivion to speed things up.
CornFS is being used in production with SSH over NFS instead of SHFS for stability. If you plan on using CornFS in a production role, please let me know.
Enjoy.
Please excuse this brain dump. As ideas come up, I continue to edit this node. Eventually, some structure will be enforced.
Inspired by SSHFS and SHFS, what would it take to make a filesystem that spans a cluster of servers and exposes aggregate diskspace while still mirroring data?
Exposing a filesystem with FUSE on a master node would be ideal, with some form of WebDAV network access (using something as simple as Apache mod_dav) for client access.
Most distributed filesystems have the idea of a "master" for metadata:
There are others, but these are the "big boys" that I can think of.
There are a couple of distributed filesystems that run without a master server. This isn't trivial to implement:
Storage servers in the cluster might each have some space set aside to this purpose. The easiest way would be to create and mount a loopback file filesystem with the space to be shared:
storage-node$ mkdir -p /data/cornfs/spool/ /data/cornfs/export/
storage-node$ dd if=/dev/zero of=/data/cornfs/spool/storage_fs bs=1M count=5k
storage-node$ mke2fs -f /data/cornfs/spool/storage_fs
storage-node$ mount -o loop /data/cornfs/spool/storage_fs /data/cornfs/export/storage
On the Master, each storage server's remote filesystem would be mounted based on the master's config (which is modeled likewise in a filesystem tree):
master-node$ mkdir -p /data/cornfs/cfgs/nodes
master-node$ cd /data/cornfs/cfgs/nodes
master-node$ echo /data/cornfs/export/storage > storage-node1
master-node$ echo /data/cornfs/export/storage > storage-node2
master-node$ mkdir -p /data/cornfs/import
master-node$ for node in * ; do mkdir -p /data/cornfs/import/$node ; shfsmount $node:`cat $node` /data/cornfs/import/$node ; done
The beauty of this is that shfs caches files and works with pretty much any host you can ssh into (including Windows via Cygwin). There are some shortcomings to shfs: "df -i" doesn't work, extended attributes aren't maintained, and it only works from linux kernels (were there only a Mac port ;)
Each file in the master tree will have a FILE pathname, including the filename.
Ideally, each file would have at least two copies. For our purposes, I'll suggest that this filesystem should endeavor to track two mirrors for every file, and clean up any "extra" copies.
The Master itself should have a few trees for the metadata. This leaves us with a few directory trees:
/data/cornfs/metadata/state/FILE
- the FILE has the same owner, group, permissions, ctime/atime/mtime, and size as the actual FILE (as a sparse file).
- Extended attributes make a great storage for things like the primary and secondary mirror server names (setxattr/getxattr).
/data/cornfs/import/SERVER/FILE
- contains the actual file, if SERVER is one of the FILE mirrors.
/data/cornfs/metadata/SERVER/FILE
- this is a sparse version of the above file, used as a sanity check and for regenerating a SERVER from scratch.
- This local metadata replica of a remote server is the masters opinion of what the server actually holds.
- If something does not exist in this copy, but exists on the server, it should be removed from that server.
- If something exists in this copy but not on the server, corruption has occurred.
/data/cornfs/metadata/cache/FILE
- a directory tree containing the past N days worth of accessed FILEs (pruned via cron)
This ends up requiring more than twice the number of actual file inodes to represent the full filesystem on the master. One full copy of the entire metadata state, one copy spread across all of the servers for their metadata state replica on the master server, and some fraction of the filesystem in cache for frequent and/or recent file access.
The Master filesystem would be mounted somewhere handy to be filled, like /master:
master$ mkdir /master
master$ /opt/cornfs/current/bin/cornfs /master
Any new files created under /master would be written to the cache until the user closes the file. On file close, the Master needs to:
When release() is called for a file, if any write() calls were used on the file, it should have been flagged as "dirty" (by an associative array in memory, along with an extended attribute just in case the running daemon is killed). If a file is dirty, it needs to be written out to the mirrors on release(). If a file is clean, don't do anything at all! The file is handily in the cache for the next access.
When reading a file:
When moving a file/directory:
When unlinking (removing) a file/directory:
Changing permissions, access times, or ownership would really only affect the /data/cornfs/metadata/state/ sparse file.
Most metadata information would use the state sparse file.
A "helper daemon" needs to run periodically to make sure that servers are accessible.
As metadata state is updated, locking must be used to ensure atomic operations on the metadata tree. We would not want multiple updates to a file to occur out of order due to a delay in a copy operation to a server in the field.
Speed and availability should be consistently monitored to select faster responding mirrors (if possible) and/or noting that nodes are unreachable for file operations to trigger a mirror for a file with a broken mirror.
Symlinks, block/character devices, and other non-files are stored in the metadata state/ tree alongside the sparse files that represent the actual files that are being distributed.
There is no "inode" construct per se, outside of the metadata state/ tree. That is the "master metadata" that most filesystem operations use. Only when reading/writing, opening/closing, moving, or unlinking, do the mounted server filesystems under import/ get involved to hold the data.
Making this a single instance store (ideal for backups) would require just a bit more logic to include an SHA1/MD5 hash encoded as a directory tree (broken up by octet to a path tree structure); something like:
/data/cornfs/metadata/state/SHA1/MD5/object
Another neat extension would be to build a "revision history" of documents in the filesystem by:
This would address files that change, but would not save us from directory trees that are removed. For this, we would want an archive/ metadata tree by datestamp:
Moving files and/or directory trees around in state/ would maintain the extended attributes, effectively retaining the revisionist history FOR FREE! When files are moved, the mirrors must be moved as well.
Reconstructing things from the revision/ and archive/ trees would be interesting, but well beyond the initial scope of this endeavor.
The quickest way to throw this together would be with the Fuse.pm perl module. I'm actively writing code now.
The eventual goal would be to write a thread aware C version based on the above prototype, primarily for speed reasons.
More to come.. SOON..