branches: front
Clone
clone: git://shithub.us/dippywood/sdbm gits://shithub.us/dippywood/sdbm
push: hjgit://shithub.us/dippywood/sdbm
Last commit
64e31263
– glenda <glenda@dwp9>
authored
on 2026/01/20 02:55
Initial commit
About
sdbm - Substitute DBM
or
Berkeley ndbm for Every UN*X Made Simple
Ozan (oz) Yigit
The Guild of PD Software Toolmakers
Toronto - Canada
oz@nexus.yorku.ca
Implementation is the sincerest form of flattery. - L. Peter
Deutsch
A The Clone of the ndbm library
The sources accompanying this notice - sdbm - consti-
tute the first public release (Dec. 1990) of a complete
clone of the Berkeley UN*X ndbm library. The sdbm library is
meant to clone the proven functionality of ndbm as closely
as possible, including a few improvements. It is practical,
easy to understand, and compatible. The sdbm library is not
derived from any licensed, proprietary or copyrighted
software.
The sdbm implementation is based on a 1978 algorithm
[Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.
In the course of searching for a substitute for ndbm, I pro-
totyped three different external-hashing algorithms [Lar78,
Fag79, Lit80] and ultimately chose Larson's algorithm as a
basis of the sdbm implementation. The Bell Labs dbm (and
therefore ndbm) is based on an algorithm invented by Ken
Thompson, [Tho90, Tor87] and predates Larson's work.
The sdbm programming interface is totally compatible
with ndbm and includes a slight improvement in database ini-
tialization. It is also expected to be binary-compatible
under most UN*X versions that support the ndbm library.
The sdbm implementation shares the shortcomings of the
ndbm library, as a side effect of various simplifications to
the original Larson algorithm. It does produce holes in the
page file as it writes pages past the end of file. (Larson's
paper include a clever solution to this problem that is a
result of using the hash value directly as a block address.)
On the other hand, extensive tests seem to indicate that
sdbm creates fewer holes in general, and the resulting page-
files are smaller. The sdbm implementation is also faster
than ndbm in database creation. Unlike the ndbm, the sdbm
__________________________
UN*X is not a trademark of any (dis)organization.
- 2 -
store operation will not ``wander away'' trying to split its
data pages to insert a datum that cannot (due to elaborate
worst-case situations) be inserted. (It will fail after a
pre-defined number of attempts.)
Important Compatibility Warning
The sdbm and ndbm libraries cannot share databases: one
cannot read the (dir/pag) database created by the other.
This is due to the differences between the ndbm and sdbm
algorithms, and the hash functions used. It is easy to con-
vert between the dbm/ndbm databases and sdbm by ignoring the
index completely: see dbd, dbu etc.
Notice of Intellectual Property
The entire sdbm library package, as authored by me, Ozan S.
Yigit, is hereby placed in the public domain. As such, the
author is not responsible for the consequences of use of
this software, no matter how awful, even if they arise from
defects in it. There is no expressed or implied warranty for
the sdbm library.
Since the sdbm library package is in the public domain,
this original release or any additional public-domain
releases of the modified original cannot possibly (by defin-
ition) be withheld from you. Also by definition, You (singu-
lar) have all the rights to this code (including the right
to sell without permission, the right to hoard and the right
to do other icky things as you see fit) but those rights are
also granted to everyone else.
Please note that all previous distributions of this
software contained a copyright (which is now dropped) to
protect its origins and its current public domain status
against any possible claims and/or challenges.
Acknowledgments
Many people have been very helpful and supportive. A
partial list would necessarily include Rayan Zacherissen
(who contributed the man page, and also hacked a MMAP ver-
sion of sdbm), Arnold Robbins, Chris Lewis, Bill Davidsen,
__________________________
Torek's discussion [Tor87] indicates that dbm/ndbm
implementations use the hash value to traverse the
radix trie differently than sdbm and as a result, the
page indexes are generated in different order. For
more information, send e-mail to the author.
You cannot really hoard something that is available to
the public at large, but try if it makes you feel any
better.
- 3 -
Henry Spencer, Geoff Collyer, Rich Salz (who got me started
in the first place), Johannes Ruschein (who did the minix
port) and David Tilbrook. I thank you all.
Distribution Manifest and Notes
This distribution of sdbm includes (at least) the following:
CHANGES change log
README this file.
biblio a small bibliography on external hashing
dba.c a crude (n/s)dbm page file analyzer
dbd.c a crude (n/s)dbm page file dumper (for conversion)
dbe.1 man page for dbe.c
dbe.c Janick's database editor
dbm.c a dbm library emulation wrapper for ndbm/sdbm
dbm.h header file for the above
dbu.c a crude db management utility
hash.c hashing function
makefile guess.
pair.c page-level routines (posted earlier)
pair.h header file for the above
readme.ms troff source for the README file
sdbm.3 man page
sdbm.c the real thing
sdbm.h header file for the above
tune.h place for tuning & portability thingies
util.c miscellaneous
dbu is a simple database manipulation program that
tries to look like Bell Labs' cbt utility. It is currently
incomplete in functionality. I use dbu to test out the rou-
tines: it takes (from stdin) tab separated key/value pairs
for commands like build or insert or takes keys for commands
like delete or look.
dbu <build|creat|look|insert|cat|delete> dbmfile
dba is a crude analyzer of dbm/sdbm/ndbm page files. It
scans the entire page file, reporting page level statistics,
and totals at the end.
dbd is a crude dump program for dbm/ndbm/sdbm data-
bases. It ignores the bitmap, and dumps the data pages in
sequence. It can be used to create input for the dbu util-
ity. Note that dbd will skip any NULLs in the key and data
fields, thus is unsuitable to convert some peculiar data-
bases that insist in including the terminating null.
__________________________
The dbd, dba, dbu utilities are quick hacks and are not
fit for production use. They were developed late one
night, just to test out sdbm, and convert some
databases.
- 4 -
I have also included a copy of the dbe (ndbm DataBase
Editor) by Janick Bergeron [janick@bnr.ca] for your pleas-
ure. You may find it more useful than the little dbu util-
ity.
dbm.[ch] is a dbm library emulation on top of ndbm (and
hence suitable for sdbm). Written by Robert Elz.
The sdbm library has been around in beta test for quite
a long time, and from whatever little feedback I received
(maybe no news is good news), I believe it has been func-
tioning without any significant problems. I would, of
course, appreciate all fixes and/or improvements. Portabil-
ity enhancements would especially be useful.
Implementation Issues
Hash functions: The algorithm behind sdbm implementa-
tion needs a good bit-scrambling hash function to be effec-
tive. I ran into a set of constants for a simple hash func-
tion that seem to help sdbm perform better than ndbm for
various inputs:
/*
* polynomial conversion ignoring overflows
* 65599 nice. 65587 even better.
*/
long
dbm_hash(char *str, int len) {
register unsigned long n = 0;
while (len--)
n = n * 65599 + *str++;
return n;
}
There may be better hash functions for the purposes of
dynamic hashing. Try your favorite, and check the pagefile.
If it contains too many pages with too many holes, (in rela-
tion to this one for example) or if sdbm simply stops work-
ing (fails after SPLTMAX attempts to split) when you feed
your NEWS history file to it, you probably do not have a
good hashing function. If you do better (for different
types of input), I would like to know about the function you
use.
Block sizes: It seems (from various tests on a few
machines) that a page file block size PBLKSIZ of 1024 is by
far the best for performance, but this also happens to limit
the size of a key/value pair. Depending on your needs, you
may wish to increase the page size, and also adjust PAIRMAX
(the maximum size of a key/value pair allowed: should always
be at least three words smaller than PBLKSIZ.) accordingly.
The system-wide version of the library should probably be
configured with 1024 (distribution default), as this appears
- 5 -
to be sufficient for most common uses of sdbm.
Portability
This package has been tested in many different UN*Xes
even including minix, and appears to be reasonably portable.
This does not mean it will port easily to non-UN*X systems.
Notes and Miscellaneous
The sdbm is not a very complicated package, at least
not after you familiarize yourself with the literature on
external hashing. There are other interesting algorithms in
existence that ensure (approximately) single-read access to
a data value associated with any key. These are directory-
less schemes such as linear hashing [Lit80] (+ Larson varia-
tions), spiral storage [Mar79] or directory schemes such as
extensible hashing [Fag79] by Fagin et al. I do hope these
sources provide a reasonable playground for experimentation
with other algorithms. See the June 1988 issue of ACM Com-
puting Surveys [Enb88] for an excellent overview of the
field.
References
[Lar78]P.-A. Larson, ``Dynamic Hashing'', BIT, vol. 18,
pp. 184-201, 1978.
[Tho90]Ken Thompson, private communication, Nov. 1990
[Lit80]W. Litwin, `` Linear Hashing: A new tool for file
and table addressing'', Proceedings of the 6th Confer-
ence on Very Large Dabatases (Montreal), pp. 212-223,
Very Large Database Foundation, Saratoga, Calif., 1980.
[Fag79]R. Fagin, J. Nievergelt, N. Pippinger, and H.
R. Strong, ``Extendible Hashing - A Fast Access Method
for Dynamic Files'', ACM Trans. Database Syst., vol. 4,
no.3, pp. 315-344, Sept. 1979.
[Wal84]Rich Wales, ``Discussion of "dbm" data base system'',
USENET newsgroup unix.wizards, Jan. 1984.
[Tor87]Chris Torek, ``Re: dbm.a and ndbm.a archives'',
USENET newsgroup comp.unix, 1987.
[Mar79]G. N. Martin, ``Spiral Storage: Incrementally Aug-
mentable Hash Addressed Storage'', Technical Report
#27, University of Warwick, Coventry, U.K., 1979.
[Enb88]R. J. Enbody and H. C. Du, ``Dynamic Hashing
Schemes'',ACM Computing Surveys, vol. 20, no. 2, pp.
85-113, June 1988.