db_open
NAME
db_open - database access methods
SYNOPSIS
On Solaris, load with -lthread:
cc [ flag ... ] file ... -lthread [ library ... ]
#include <db.h>
int
db_open(const char *file, DBTYPE type,
int flags, int mode, DB_ENV *dbenv, DB_INFO *dbinfo, DB **dbpp);
DESCRIPTION
The DB library is a family of groups of functions that
provides a modular programming interface to transactions
and record-oriented file access. The library includes
support for transactions, locking, logging and file page
caching, as well as various indexed access methods. Many
of the functional groups (e.g., the file page caching
functions) are useful independent of the other DB func-
tions, although some functional groups are explicitly
based on other functional groups (e.g., transactions and
logging). For a general description of the DB package,
see db(3). For a description of the access methods, see
db_open(3). For a description of cursors within access
methods, see db_cursor(3); transactions, see db_txn(3);
the lock manager, see db_lock(3); the log manager, see
db_log(3); the memory pool manager, see db_mpool(3). For
information on configuring the DB transaction processing
environment, and DB support utilities, see db_appinit(3),
db_archive(1), db_checkpoint(1), db_deadlock(1) and
db_recover(1). For information on dumping and reloading
DB databases, see db_dump(1) and db_load(1).
This manual page describes the overall structure of the DB
library access methods.
The currently supported file formats are btree, hashed and
recno. The btree format is a representation of a sorted,
balanced tree structure. The hashed format is an extensi-
ble, dynamic hashing scheme. The recno format supports
fixed or variable length records (optionally retrieved
from a flat text file).
The db_open function opens the database represented by
file for both reading and writing. Files never intended
to be shared or preserved on disk may be created by set-
ting the file parameter to NULL.
The db_open function copies a pointer to a DB structure
(as typedef'd in the <db.h> include file), into the memory
location referenced by dbpp. This structure includes a
set of functions to perform various database actions, as
described below. The db_open function returns the value
of errno on failure and 0 on success.
Note, while most of the access methods use file as the
name of an underlying file on disk, this is not guaran-
teed. Also, calling db_open is a reasonably expensive
operation. This is based on a model where the DBMS keeps
a set of files open for a long time rather than opening
and closing them on each query.)
The type argument is of type DBTYPE (as defined in the
<db.h> include file) and must be set to one of DB_BTREE,
DB_HASH, DB_RECNO or DB_UNKNOWN. If type is DB_UNKNOWN,
the database must already exist and db_open will then
determine if it's of type DB_BTREE, DB_HASH or DB_RECNO.
The flags and mode arguments specify how files will be
opened and/or created when they don't already exist. The
flags value is specified by or'ing together one or more of
the following values:
DB_CREATE
Create any underlying files, as necessary. If the
files do not already exist and the DB_CREATE flag is
not specified, the call will fail.
DB_NOMMAP
Do not map this file (see db_mpool(3) for further
information).
DB_RDONLY
Open the database for reading only. Any attempt to
write the database using the access methods will fail
regardless of the actual permissions of any underly-
ing files.
DB_THREAD
Cause the DB handle returned by the db_open function
to be useable by multiple threads within a single
address space, i.e., to be ``free-threaded''.
DB_TRUNCATE
``Truncate'' the database if it exists, i.e., behave
as if the database were just created, discarding any
previous contents.
All files created by the access methods are created with
mode mode (as described in chmod(2)) and modified by the
process' umask value at the time of creation (see
umask(2)). The group ownership of created files is based
on the system and directory defaults, and is not further
specified by DB.
DB_ENV
The access methods make calls to the other subsystems in
the DB library based on the dbenv argument to db_open,
which is a pointer to a structure of type DB_ENV (type-
def'd in <db.h>). It is expected that applications will
use a single DB_ENV structure as the argument to all of
the subsystems in the DB package. In order to ensure com-
patibility with future releases of DB, all fields of the
DB_ENV structure that are not explicitly set should be
initialized to 0 before the first time the structure is
used. Do this by declaring the structure external or
static, or by calling the C library routine bzero(3) or
memset(3).
The fields of DB_ENV used by db_open are described below.
As references to the DB_ENV structure may be maintained by
db_open, it is necessary that the DB_ENV structure and
memory it references be valid until after the close func-
tion is called. If dbenv is NULL or any of its fields are
set to 0, defaults appropriate for the system are used
where possible.
The following DB_ENV fields may be initialized before
calling db_open:
DB_LOG *lg_info;
If modifications to the file being opened should be
logged, the lg_info field contains a return value
from the function log_open. If lg_info is NULL, no
logging is done by the DB access methods.
DB_LOCKTAB *lk_info;
If locking is required for the file being opened (as
is the case when multiple processes or threads are
accessing the same file), the lk_info field contains
a return value from the function lock_open. If
lk_info is NULL, no locking is done by the DB access
methods.
If both locking and transactions are being performed
(i.e., both lk_info and tx_info are non-NULL), the
transaction ID will be used as the locker ID. If
only locking is being performed, db_open will acquire
a locker ID from lock_id(3), and will use it for all
locks required for this instance of db_open.
DB_MPOOL *mp_info;
If the cache for the file being opened should be
maintained in a shared buffer pool, the mp_info field
contains a return value from the function memp_open.
If mp_info is NULL, a memory pool may still be cre-
ated by DB, but it will be private to the application
and managed by DB.
DB_TXNMGR *tx_info;
If the accesses to the file being opened should take
place in the context of transactions (providing atom-
icity and error recovery), the tx_info field contains
a return value from the function txn_open (see
db_txn(3)). If transactions are specified, the
application is responsible for making suitable calls
to txn_begin, txn_abort, and txn_commit. If tx_info
is NULL, no transaction support is done by the DB
access methods.
When the access methods are used in conjunction with
transactions, the application must abort the transac-
tion (using txn_abort) if any of the transaction pro-
tected access method calls (i.e., any calls other
than open, close and sync) returns an error value.
As described by db(3), an error value is any value
greater than 0.
DB_INFO
The access methods are configured using the DB_INFO data
structure argument to db_open. The DB_INFO structure is
typedef'd in <db.h> and has a large number of fields, most
specific to a single access method, although a few are
shared. The fields that are common to all access methods
are listed here; those specific to an individual access
method are described below. No reference to the DB_INFO
structure is maintained by DB, so it is possible to dis-
card it as soon as the db_open call returns.
In order to ensure compatibility with future releases of
DB, all fields of the DB_INFO structure should be initial-
ized to 0 before the structure is used. Do this by
declaring the structure external or static, or by calling
the C library function bzero(3) or memset(3).
If possible, defaults appropriate for the system are used
for the DB_INFO fields if dbinfo is NULL or any fields of
the DB_INFO structure are set to 0. The following DB_INFO
fields may be initialized before calling db_open:
size_t db_cachesize;
A suggested maximum size of the memory pool cache, in
bytes. If db_cachesize is 0, an appropriate default
is used. If the mp_info field is also specified,
this field is ignored.
Note, the minimum number of pages in the cache should
be no less than 10, and the access methods will fail
if an insufficiently large cache is specified. In
addition, for applications that exhibit strong local-
ity in their data access patterns, increasing the
size of the cache can significantly improve applica-
tion performance.
int db_lorder;
The byte order for integers in the stored database
metadata. The number should represent the order as
an integer, for example, big endian order is the num-
ber 4,321, and little endian order is the number
1,234. If db_lorder is 0, the host order of the
machine where the DB library was compiled is used.
The access methods provide no guarantees about the
byte ordering of the data stored in the database, and
applications are responsible for maintaining any nec-
essary ordering.
size_t db_pagesize;
The size of the pages used to hold items in the
database, in bytes. The minimum page size is 512
bytes and the maximum page size is 64K bytes. If
db_pagesize is 0, a page size is selected based on
the underlying filesystem I/O block size. The
selected size has a lower limit of 512 bytes and an
upper limit of 16K bytes.
void *(*db_malloc)(size_t);
The flag DB_DBT_MALLOC, when specified in the DBT
structure, will cause the DB library to allocate mem-
ory which then becomes the responsibility of the
calling application.
On systems where separate heaps are maintained for
applications and libraries (notably Windows NT),
specifying the DB_DBT_MALLOC flag will fail because
the DB library will allocate memory from a different
heap than the application will use to free it. To
avoid this problem, the db_malloc field should be set
to point to the application's allocation routine. If
db_malloc is non-NULL, it will be used to allocate
the memory returned when the DB_DBT_MALLOC flag is
set. The db_malloc function must match the calling
conventions of the malloc(3) library routine.
BTREE
The btree data structure is a sorted, balanced tree struc-
ture storing associated key/data pairs. Searches, inser-
tions, and deletions in the btree will all complete in O
(lg base N) where base is the average number of keys per
page. Often, inserting ordered data into btrees results
pages that are half-full. This implementation has been
modified to make ordered (or inverse ordered) insertion
the best case, resulting in nearly perfect page space uti-
lization.
Space freed by deleting key/data pairs from the database
is never reclaimed, although it is reused where possible.
This means that the btree storage structure is grow-only.
If sufficiently many keys are deleted from a tree that
shrinking the tree is desirable, this can be accomplished
by periodically creating a new tree from a scan of the
existing one.
The following additional fields and flags may be initial-
ized before calling db_open, when using the btree access
method:
int (*bt_compare)(const DBT *, const DBT *);
Compare is the key comparison function. It must
return an integer less than, equal to, or greater
than zero if the first key argument is considered to
be respectively less than, equal to, or greater than
the second key argument. The same comparison func-
tion must be used on a given tree every time it is
opened. If compare is NULL, the keys are compared
lexically, with shorter keys collating before longer
keys.
int bt_minkey;
The minimum number of keys that will be stored on any
single page. This value is used to determine which
keys will be stored on overflow pages, i.e. if a key
or data item is larger than the pagesize divided by
the minkey value, it will be stored on overflow pages
instead of in the page itself. The bt_minkey value
specified must be at least 2; if bt_minkey is 0, a
value of 2 is used.
size_t (*bt_prefix)(const DBT *, const DBT *);
Prefix is the prefix comparison function. If speci-
fied, this function must return the number of bytes
of the second key argument that are necessary to
determine that it is greater than the first key argu-
ment. If the keys are equal, the key length should
be returned.
This is used to compress the keys stored on the btree
internal pages. The usefulness of this is data
dependent, but in some data sets can produce signifi-
cantly reduced tree sizes and search times. If
bt_prefix is NULL, and no comparison function is
specified, a default lexical comparison function is
used. If bt_prefix is NULL and a comparison function
is specified, no prefix comparison is done.
unsigned long flags;
The following additional flags may be specified by
or'ing together one or more of the following values:
DB_DUP
Permit duplicate keys in the tree, i.e. inser-
tion when the key of the key/data pair being
inserted already exists in the tree will be suc-
cessful. The ordering of duplicates in the tree
is determined by the order of insertion, unless
the ordering is otherwise specified by use of a
cursor (see db_cursor(3) for more information.)
HASH
The hash data structure is an extensible, dynamic hashing
scheme. Backward compatible interfaces to the functions
described in dbm(3), ndbm(3) and hsearch(3) are provided,
however these interfaces are not compatible with previous
file formats.
The following additional fields and flags may be initial-
ized before calling db_open, when using the hash access
method:
unsigned int h_ffactor;
Ffactor indicates a desired density within the hash
table. It is an approximation of the number of keys
allowed to accumulate in any one bucket, determining
when the hash table grows or shrinks. The default
value is 0, indicating that the fill factor will be
selected dynamically as pages are filled.
u_int32_t (*h_hash)(const void *, u_int32_t);
The h_hash field is a user defined hash function; if
h_hash is NULL, a default hash function is used.
Since no hash function performs equally well on all
possible data, the user may find that the built-in
hash function performs poorly with a particular data
set. User specified hash functions must take a
pointer to a byte string and a length as arguments
and return a u_int32_t value.
If a hash function is specified, hash_open will
attempt to determine if the hash function specified
is the same as the one with which the database was
created, and will fail if it detects that it is not.
unsigned int h_nelem;
An estimate of the final size of the hash table. If
not set or set too low, hash tables will expand
gracefully as keys are entered, although a slight
performance degradation may be noticed. The default
value is 1.
unsigned long flags;
The following additional flags may be specified by
or'ing together one or more of the following values:
DB_DUP
Permit duplicate keys in the tree, i.e. inser-
tion when the key of the key/data pair being
inserted already exists in the tree will be suc-
cessful. The ordering of duplicates in the tree
is determined by the order of insertion, unless
the ordering is otherwise specified by use of a
cursor (see db_cursor(3) for more information.)
RECNO
The recno access method provides support for fixed and
variable length records, optionally backed by a flat text
(byte stream) file. Both fixed and variable length
records are accessed by their logical record number.
The logical record numbers are mutable and change as lines
are added to and deleted from the file. For example, the
existence of record number five requires the existence of
records one through four, and the deletion of record num-
ber one causes records numbered two through five to be
renumbered to be records numbered one through four. If a
cursor were positioned after record number one, it would
be shifted down one logical record as well, continuing to
reference the same record as before. For this reason,
concurrent access to a recno database may be largely mean-
ingless, although it is supported.
Using the c_put or put interfaces to create new records
will cause the creation of multiple, empty records if the
record number is more than one greater than the largest
record currently in the database. For example, the cre-
ation of record number five, when records one through four
do not exist, causes their logical creation with zero-
length data. If the created record is not at the end of
the database, all records following the new record will be
automatically renumbered.
The following additional fields and flags may be initial-
ized before calling db_open, when using the recno access
method:
int re_delim;
For variable length records, if the re_source file is
specified and the DB_DELIMITER flag is set, the
delimiting byte used to mark the end of a record in
the source file. If the re_source file is specified
and the DB_DELIMITER flag is not set, <newline> char-
acters (i.e. ``\n'', 0x0a) are interpreted as end-of-
record markers.
u_int32_t re_len;
The length of a fixed-length record.
int re_pad;
For fixed length records, if the DB_PAD flag is set,
the pad character for short records. If the DB_PAD
flag is not set, <space> characters (i.e., 0x20) are
used for padding.
char *re_source;
The purpose of the re_source field is to provide fast
access and modification to databases that are nor-
mally stored as flat text files. In this case, no
index is maintained across calls to db_open.
If the re_source field is non-NULL, it specifies an
underlying flat text database file that is read to
initialize a transient record number index. In the
case of variable length records, the records are sep-
arated by the byte value re_delim. For example,
standard UNIX byte stream files can be interpreted as
a sequence of variable length records separated by
<newline> characters.
In addition, when cached data would normally be writ-
ten back to the underlying database file (e.g., the
close or sync functions are called), the in-memory
copy of the database is written back to the re_source
file. When the close function is called, the in-mem-
ory copy of the database is discarded.
Because there is no meta-data associated with the
underlying source file, any differences from the
default values (e.g., fixed record length or byte
separator value) must be explicitly specified each
time the file is opened.
Because the close and sync functions write a backing
file that is not transactionally protected, it is an
error to specify a re_source file and either the
DB_THREAD flag or a non-NULL tx_info field in the
DB_ENV argument to db_open.
The re_source file must already exist (but may be
zero-length) when db_open is called.
unsigned long flags;
The following additional flags may be specified by
or'ing together one or more of the following values:
DB_DELIMITER
The re_delim field is set.
DB_FIXEDLEN
The records are fixed-length, not byte delim-
ited. The structure element re_len specifies
the length of the record, and the structure ele-
ment re_pad is used as the pad character.
Any records added to the database that are less
than re_len bytes long are automatically padded.
Any attempt to insert records into the database
that are greater than re_len bytes long will
cause the call to fail immediately and return an
error.
DB_PAD
The re_pad field is set.
DB_SNAPSHOT
This flag requires that a copy of any specified
re_source file be taken immediately when db_open
is called. If this flag is not specified, DB
may choose to retrieve unmodified records from
the re_source file (modified records must be
stored elsewhere since they could potentially
cause the re_source file to change in size).
KEY/DATA PAIRS
Storage and retrieval for the access methods are based on
key/data pairs. Key and data byte strings may reference
strings of essentially unlimited length, although any two
keys must fit into available memory at the same time so
that they may be compared and any one data item must fit
into available memory so that it may be returned.
The access methods provide no guarantees about byte string
alignment, and applications are responsible for maintain-
ing any necessary alignment. Use the DB_DBT_USERMEM flag
to cause returned items to be placed in memory of arbi-
trary alignment.
Both keys and data are represented by the following data
structure:
typedef struct {
void *data;
u_int32_t size;
u_int32_t ulen;
u_int32_t dlen;
u_int32_t doff;
u_int32_t flags;
} DBT;
In order to ensure compatibility with future releases of
DB, all fields of the DBT structure that are not explic-
itly set should be initialized to 0 before the first time
the structure is used. Do this by declaring the structure
external or static, or by calling the C library routine
bzero(3) or memset(3).
By default, the flags structure element is expected to be
0. In this default case, when being provided a key or
data item by the application, the DB package expects the
data structure element to point to a byte string of size
bytes. When returning a key/data item to the application,
the DB package will store into the data structure element
a pointer to a byte string of size bytes. By default, the
memory referenced by this stored pointer is only valid
until the next call to the DB package using the DB handle
returned by db_open.
The elements of the DBT structure are defined as follows:
void *data;
A pointer to a byte string.
u_int32_t size;
The length of data, in bytes.
u_int32_t ulen;
The size of the user's buffer (referenced by data),
in bytes. This location is not written by the DB
functions. See the DB_DBT_USERMEM flag for more
information.
u_int32_t dlen;
The length of the partial record being read or writ-
ten by the application, in bytes. See the
DB_DBT_PARTIAL flag for more information.
u_int32_t doff;
The offset of the partial record being read or writ-
ten by the application, in bytes. See the
DB_DBT_PARTIAL flag for more information.
u_int32_t flags;
The flags value is specified by or'ing together one
or more of the following values:
DB_DBT_MALLOC
Ignored except when retrieving information from
a database, e.g., a get call. This flag causes
DB to allocate memory for the returned key or
data item (using malloc(3)) and return a pointer
to it in the data field of the key or data DBT
structure. The allocated memory becomes the
responsibility of the calling application. It
is an error to specify both DB_DBT_MALLOC and
DB_DBT_USERMEM.
DB_DBT_USERMEM
Ignored except when retrieving information from
a database, e.g., a get call. The data field of
the key or data structure must reference memory
that is at least ulen bytes in length. If the
length of the requested item is less than or
equal to that number of bytes, the item is
copied into the memory referenced by the data
field. Otherwise, an error is returned, the
size field is set to the length needed for the
requested item, and the errno variable is set to
ENOMEM. It is an error to specify both
DB_DBT_MALLOC and DB_DBT_USERMEM.
DB_DBT_PARTIAL
Ignored except when specified for a data parame-
ter, where this flag causes the partial
retrieval or storage of an item. If the calling
application is doing a get, the dlen bytes
starting doff bytes from the beginning of the
retrieved data record are returned as if they
comprised the entire record. If the specified
bytes do not exist in the record, the get is
successful, and 0 bytes are returned.
For example, if the data portion of a retrieved
record was 100 bytes, and a partial retrieval
was done using a DBT having a dlen field of 20
and a doff field of 85, the get call would suc-
ceed, the data field would reference the last 15
bytes of the record, and the size field would be
set to 15.
If the calling application is doing a put, the
dlen bytes starting doff bytes from the begin-
ning of the specified key's data record are
replaced by the data specified by the data and
size structure elements. If dlen is smaller
than size, the record will grow, and if dlen is
larger than size, the record will shrink. If
the specified bytes do not exist, the record
will be extended using nul bytes as necessary,
and the put call will succeed.
It is an error to attempt a partial put using
the db_open returned put function in a database
that supports duplicate records. Partial puts
in databases supporting duplicate records must
be done using a db_cursor function. It is an
error to attempt a partial put with differing
dlen and size values in a recno database with
fixed-length records.
For example, if the data portion of a retrieved
record was 100 bytes, and a partial store was
done using a DBT having a dlen field of 20, a
doff field of 85, and a size field of 30, the
resulting record would be 115 bytes in length,
where the last 30 bytes would be those specified
by the put call.
When multiple threads are using the returned DB handle
concurrently, either the DB_DBT_MALLOC or DB_DBT_USERMEM
flags must be specified for any DBT used for key or data
retrieval.
The data part of the key/data pair used to access fixed
and variable length records (the recno access method) is
the same as the other access methods. The key, used to
specify the logical record number, is different.
In the case of the recno access method, the data field of
the key is a pointer to a memory location of type
db_recno_t, typedef'd in the <db.h> include file. This
type is normally the largest unsigned integral type avail-
able to the implementation. The size field of the key
should be the size of that type, e.g.,
``sizeof(db_recno_t)''.
DB OPERATIONS
The DB structure returned by db_open describes a database
type, and includes a set of functions to perform various
actions, as described below. Each of these functions
takes a pointer to a DB structure, and may take one or
more DBT *'s and a flag value as well. Individual access
methods may specify additional functions and flags which
are specific to the method. The fields of the DB struc-
ture are as follows:
DBTYPE type;
The type of the underlying access method (and file
format). Set to one of DB_BTREE, DB_HASH or
DB_RECNO. This field may be used to determine the
type of the database after a return from db_open with
the type argument set to DB_UNKNOWN.
int (*close)(DB *db, int flags);
A pointer to a function to flush any cached informa-
tion to disk, close any open cursors (see db_cur-
sor(3)), free any allocated resources, and close any
underlying files. Since key/data pairs are cached in
memory, failing to sync the file with the close or
sync function may result in inconsistent or lost
information.
The flags parameter must be set to 0 or the following
value:
DB_NOSYNC
Do not flush cached information to disk.
The DB_NOSYNC flag is a dangerous option. It should
only be set if the application is doing logging (with
or without transactions) so that the database is
recoverable after a system or application crash, or
if the database is always generated from scratch
after any system or application crash.
It is important to understand that flushing cached
information to disk only minimizes the window of
opportunity for corrupted data. While unlikely, it
is possible for database corruption to happen if a
system or application crash occurs while writing data
to the database. To ensure that database corruption
never occurs, applications must either: use logging
to guarantee recoverability, or edit a copy of the
database, and, once all applications using the
database have successfully called close, replace the
original database with the updated copy.
When multiple threads are using the DB handle concur-
rently, only a single thread may call the DB handle
close function.
The close function returns the value of errno on
failure and 0 on success.
int (*cursor)(DB *db, DB_TXN *txnid, DBC **cursorp);
A pointer to a function to create a cursor and copy a
pointer to it into the memory referenced by cursorp.
A cursor is a structure used to provide sequential
access through a database. This interface and its
associated functions replaces the functionality pro-
vided by the seq function in previous releases of the
DB library.
If the file is being accessed under transaction pro-
tection, the txnid parameter is a transaction ID
returned from txn_begin, otherwise, NULL. If trans-
action protection is enabled, cursors must be opened
and closed within the context of a transaction, and
the txnid parameter specifies the transaction context
in which the cursor may be used. See db_cursor(3)
for more information.
The cursor function returns the value of errno on
failure and 0 on success.
int (*del)(DB *db, DB_TXN *txnid, DBT *key, int flags);
A pointer to a function to remove key/data pairs from
the database. The key/data pair associated with the
specified key is discarded from the database. In the
presence of duplicate key values, all records associ-
ated with the designated key will be discarded.
If the file is being accessed under transaction pro-
tection, the txnid parameter is a transaction ID
returned from txn_begin, otherwise, NULL.
The flags parameter is currently unused, and must be
set to 0.
The del function returns the value of errno on fail-
ure, 0 on success, and DB_NOTFOUND if the specified
key did not exist in the file.
int (*fd)(DB *db, int *fdp);
A pointer to a function that copies a file descriptor
representative of the underlying database into the
memory referenced by fdp. A file descriptor refer-
encing the same file will be returned to all pro-
cesses that call db_open with the same file argument.
This file descriptor may be safely used as an argu-
ment to the fcntl(2) and flock(2) locking functions.
The file descriptor is not necessarily associated
with any of the underlying files used by the access
method.
The fd function was introduced in early versions of
DB, before the lock manager was added, to support a
coarse-grained form of locking. Applications should
be converted to use the lock manager where possible,
and this interface should not be used by new applica-
tions.
The fd function returns the value of errno on failure
and 0 on success.
int (*get)(DB *db, DB_TXN *txnid,
DBT *key, DBT *data, int flags);
A pointer to a function that is an interface for
keyed retrieval from the database. The address and
length of the data associated with the specified key
are returned in the structure referenced by data.
In the presence of duplicate key values, get will
return the first data item for the designated key.
Duplicates are sorted by insert order except where
this order has been overwritten by cursor operations.
Retrieval of duplicates requires the use of cursor
operations. See db_cursor(3) for details.
If the file is being accessed under transaction pro-
tection, the txnid parameter is a transaction ID
returned from txn_begin, otherwise, NULL.
The flags parameter is currently unused, and must be
set to 0.
The get function returns the value of errno on fail-
ure, 0 on success, and DB_NOTFOUND if the key was not
found.
int (*put)(DB *db, DB_TXN *txnid,
DBT *key, DBT *data, int flags);
A pointer to a function to store key/data pairs in
the database. If the database supports duplicates,
the put function adds the new data value at the end
of the duplicate set.
If the file is being accessed under transaction pro-
tection, the txnid parameter is a transaction ID
returned from txn_begin, otherwise, NULL.
The flags parameter must be set to 0 or the following
value:
DB_NOOVERWRITE
Enter the new key/data pair only if the key does
not already appear in the database.
The default behavior of the put function is to enter
the new key/data pair, replacing any previously
existing key if duplicates are disallowed, or to add
a duplicate entry if duplicates are allowed. Even if
the designated database allows duplicates, a call to
put with the DB_NOOVERWRITE flag set will fail if the
key already exists in the database.
The put function returns the value of errno on fail-
ure, 0 on success, and DB_KEYEXIST if the DB_NOOVER-
WRITE flag was set and the key already exists in the
file.
int (*sync)(DB *db, int flags);
A pointer to a function to flush any cached informa-
tion to disk. If the database is in memory only, the
sync function has no effect and will always succeed.
The flags parameter is currently unused, and must be
set to 0.
See the close function description above for a dis-
cussion of DB and cached data.
The sync function returns the value of errno on fail-
ure and 0 on success.
int (*stat)(DB *db, void *gsp, void *lsp,
void *(*db_malloc)(size_t));
A pointer to a function to create statistical struc-
tures and copy pointers to them into user-specified
memory locations.
In the presence of multiple threads or processes
accessing an active database, the returned informa-
tion can be out-of-date. This function may access
all of the pages in the database, and therefore may
incur a severe performance penalty and have obvious
negative effects on the underlying buffer pool.
If gsp is non-NULL, a pointer to the global statis-
tics for the database are copied into the memory
location it references. If lsp is non-NULL, a
pointer to the per-DB-handle statistics for the
database are copied into the memory location it ref-
erences. Calls to the sync function aggregate local
statistics with global statistics and reinitialize
the local statistics to 0.
The statistical structures are created in allocated
memory. If db_malloc is non-NULL, it is called to
allocate the memory, otherwise, the library malloc(3)
function is used. The function db_malloc must match
the calling conventions of the malloc(3) library rou-
tine. The caller is responsible for deallocating
this memory.
In the case of a btree or recno database, the global
statistics are stored in a structure of type
DB_BTREE_STAT (typedef'd in <db.h>). The following
fields will be filled in:
u_int32_t bt_pagesize;
Underlying tree page size.
u_int32_t bt_levels;
Number of levels in the tree.
u_int32_t bt_nrecs;
Number of data items in the tree (since there
may be multiple data items per key, this number
may not be the same as the number of keys).
u_int32_t bt_int_pg;
Number of tree internal pages.
u_int32_t bt_leaf_pg;
Number of tree leaf pages.
u_int32_t bt_dup_pg;
Number of tree duplicate pages.
u_int32_t bt_over_pg;
Number of tree overflow pages.
u_int32_t bt_free;
Number of pages on the free list.
u_int32_t bt_freed;
Number of pages made available for reuse because
they were emptied.
u_int32_t bt_int_pgfree;
Number of bytes free in tree internal pages.
u_int32_t bt_leaf_pgfree;
Number of bytes free in tree leaf pages.
u_int32_t bt_dup_pgfree;
Number of bytes free in tree duplicate pages.
u_int32_t bt_over_pgfree;
Number of bytes free in tree overflow pages.
u_int32_t bt_pfxsaved;
Number of bytes saved by prefix compression.
u_int32_t bt_split;
Total number of tree page splits (includes fast
and root splits).
u_int32_t bt_rootsplit;
Number of root page splits.
u_int32_t bt_fastsplit;
Number of fast splits. When sorted keys are
added to the database, the DB btree implementa-
tion will split left or right to increase the
page-fill factor. This number is a measure of
how often it was possible to make such a split.
u_int32_t bt_added;
Number of keys added.
u_int32_t bt_deleted;
Number of keys deleted.
u_int32_t bt_get;
Number of keys retrieved. (Note, when returned
as part of the global statistics, this value
will not reflect any keys retrieved when the
database was open for read-only access.)
u_int32_t bt_cache_hit;
Number of hits in tree fast-insert code. When
sorted keys are added to the database, the DB
btree implementation will check the last page
where an insert occurred before doing a full
lookup. This number is a measure of how often
the lookup was successful.
u_int32_t bt_cache_miss;
Number of misses in tree fast-insert code. See
the description of bt_cache_hit; this number is
a measure of how often the lookup failed.
In the case of a btree or recno database, the local
statistics are stored in a structure of type
DB_BTREE_LSTAT (typedef'd in <db.h>). The following
fields will be filled in:
u_int32_t bt_split;
Total number of tree page splits (includes fast
and root splits).
u_int32_t bt_rootsplit;
Number of root page splits.
u_int32_t bt_fastsplit;
Number of fast splits. When sorted keys are
added to the database, the DB btree implementa-
tion will split left or right to increase the
page-fill factor. This number is a measure of
how often it was possible to make such a split.
u_int32_t bt_added;
Number of keys added.
u_int32_t bt_deleted;
Number of keys deleted.
u_int32_t bt_get;
Number of keys retrieved.
u_int32_t bt_pgdeleted;
Number of pages deleted because they were emp-
tied.
u_int32_t bt_cache_hit;
Number of hits in tree fast-insert code. When
sorted keys are added to the database, the DB
btree implementation will check the last page
where an insert occurred before doing a full
lookup. This number is a measure of how often
the lookup was successful.
u_int32_t bt_cache_miss;
Number of misses in tree fast-insert code. See
the description of bt_cache_hit; this number is
a measure of how often the lookup failed.
ENVIRONMENT VARIABLES
The following environment variables affect the execution
of db_open:
DB_HOME
If the dbenv argument to db_open was initialized
using db_appinit, the environment variable DB_HOME
may be used as the path of the database home for the
interpretation of the dir argument to db_open, as
described in db_appinit(3). Specifically, db_open is
affected by the configuration string value of
DB_DATA_DIR.
EXAMPLES
Applications that create short-lived databases that are
discarded or recreated when the system fails and are
unconcerned with concurrent access and loss of data due to
catastrophic failure, may wish to use the db_open func-
tionality without other parts of the DB library. Such
applications will only be concerned with the DB access
methods. The DB access methods will use the memory pool
subsystem, but the application is unlikely to be aware of
this. See the file examples/ex_access.c in the DB source
distribution for a C language code example of how such an
application might use the DB library.
ERRORS
The db_open function may fail and return errno for any of
the errors specified for the following DB and library
functions: close(2), fcntl(2), fstat(2), getpid(2),
mmap(2), munmap(2), open(2), read(2), unlink(2), abort(3),
calloc(3), db->sync, fflush(3), free(3), getenv(3),
isdigit(3), lock_get(3), lock_id(3), lock_put(3),
lock_vec(3), log_register(3), log_unregister(3), mal-
loc(3), memcpy(3), memp_close(3), memp_fclose(3),
memp_fget(3), memp_fopen(3), memp_fput(3), memp_fset(3),
memp_fsync(3), memp_open(3), memp_register(3), memset(3),
sigfillset(3), sigprocmask(3), stat(3), strcpy(3),
strdup(3), strerror(3), strlen(3), t->re_irec and
vsnprintf(3).
In addition, the db_open function may fail and return
errno for the following conditions:
[EAGAIN]
A lock was unavailable.
[EINVAL]
An invalid flag value or parameter was specified
(e.g., unknown database type, page size, hash func-
tion, recno pad byte, byte order) or a flag value or
parameter that is incompatible with the current file
specification.
TMPDIR If the dbenv argument to _open was NULL or not
initialized using db_appinit, the environment vari-
able TMPDIR may be used as the directory in which to
create the , as described in the _open section above.
There is a mismatch between the version number of
file and the software.
A re_source file was specified with either the
DB_THREAD flag or a non-NULL tx_info field in the
DB_ENV argument to db_open.
[ENOENT]
A non-existent re_source file was specified.
[EPERM]
Database corruption was detected. All subsequent
database calls (other than db->close) will return
EPERM.
The db->close function may fail and return errno for any
of the errors specified for the following DB and library
functions: close(2), fcntl(2), getpid(2), munmap(2),
open(2), unlink(2), abort(3), db->db_malloc, db->sync,
fflush(3), fprintf(3), free(3), getenv(3), isdigit(3),
lock_get(3), lock_put(3), lock_vec(3), log_put(3), mal-
loc(3), memcpy(3), memmove(3), memp_fget(3), memp_fput(3),
memp_fset(3), memset(3), realloc(3), sigfillset(3), sig-
procmask(3), snprintf(3), stat(3), strcpy(3), strdup(3),
strerror(3), strlen(3) and vsnprintf(3).
The db->cursor function may fail and return errno for any
of the errors specified for the following DB and library
functions: free(3).
In addition, the db->cursor function may fail and return
errno for the following conditions:
[EINVAL]
An invalid flag value or parameter was specified.
[EPERM]
Database corruption was detected. All subsequent
database calls (other than db->close) will return
EPERM.
The db->del function may fail and return errno for any of
the errors specified for the following DB and library
functions: db->db_malloc, fflush(3), fprintf(3), free(3),
lock_get(3), lock_put(3), lock_vec(3), log_put(3), mal-
loc(3), memcpy(3), memmove(3), memp_fget(3), memp_fput(3),
memp_fset(3), memset(3), realloc(3) and vsnprintf(3),
In addition, the db->del function may fail and return
errno for the following conditions:
[EAGAIN]
A lock was unavailable.
[EINVAL]
An invalid flag value or parameter was specified.
[EPERM]
Database corruption was detected. All subsequent
database calls (other than db->close) will return
EPERM.
In addition, the db->fd function may fail and return errno
for the following conditions:
[ENOENT]
The db->fd function was called for an in-memory
database, or no underlying file has yet been created.
[EPERM]
Database corruption was detected. All subsequent
database calls (other than db->close) will return
EPERM.
The db->get function may fail and return errno for any of
the errors specified for the following DB and library
functions: db->db_malloc, fflush(3), fprintf(3),
lock_get(3), lock_put(3), lock_vec(3), malloc(3), mem-
cpy(3), memp_fget(3), memp_fput(3), realloc(3) and
vsnprintf(3).
In addition, the db->get function may fail and return
errno for the following conditions:
[EAGAIN]
A lock was unavailable.
[EINVAL]
An invalid flag value or parameter was specified.
The DB_THREAD flag was specified to the db_open(3)
function and neither the DB_DBT_MALLOC or
DB_DBT_USERMEM flags were set in the DBT.
A record number of 0 was specified.
[EPERM]
Database corruption was detected. All subsequent
database calls (other than db->close) will return
EPERM.
The db->put function may fail and return errno for any of
the errors specified for the following DB and library
functions: db->db_malloc, fflush(3), fprintf(3), free(3),
lock_get(3), lock_put(3), lock_vec(3), log_put(3), mal-
loc(3), memcpy(3), memmove(3), memp_fget(3), memp_fput(3),
memp_fset(3), memset(3), realloc(3), t->bt_prefix and
vsnprintf(3).
In addition, the db->put function may fail and return
errno for the following conditions:
[EACCES]
An attempt was made to modify a read-only database.
[EAGAIN]
A lock was unavailable.
[EINVAL]
An invalid flag value or parameter was specified.
A record number of 0 was specified.
An attempt was made to add a record to a fixed-length
database that was too large to fit.
An attempt was made to do a partial put.
[EPERM]
Database corruption was detected. All subsequent
database calls (other than db->close) will return
EPERM.
[ENOSPC]
A btree exceeded the maximum btree depth (255).
The db->sync function may fail and return errno for any of
the errors specified for the following DB and library
functions: close(2), fcntl(2), open(2), write(2),
abort(3), db->db_malloc, fflush(3), fprintf(3), free(3),
lock_get(3), lock_put(3), lock_vec(3), log_put(3), mal-
loc(3), memcpy(3), memmove(3), memp_fget(3), memp_fput(3),
memp_fset(3), memp_fsync(3), memset(3), realloc(3), str-
error(3), t->bt_prefix, t->re_irec and vsnprintf(3).
In addition, the db->sync function may fail and return
errno for the following conditions:
[EINVAL]
An invalid flag value or parameter was specified.
[EPERM]
Database corruption was detected. All subsequent
database calls (other than db->close) will return
EPERM.
The db->stat function may fail and return errno for any of
the errors specified for the following DB and library
functions: malloc(3).
BUGS
The access methods provide no guarantees about byte string
alignment, and applications are responsible for maintain-
ing any necessary alignment.
The name DBT is a mnemonic for ``data base thang'', and
was used because noone could think of a reasonable name
that wasn't already used somewhere else.
SEE ALSO
The Ubiquitous B-tree, Douglas Comer, ACM Comput. Surv.
11, 2 (June 1979), 121-138.
Prefix B-trees, Bayer and Unterauer, ACM Transactions on
Database Systems, Vol. 2, 1 (March 1977), 11-26.
The Art of Computer Programming Vol. 3: Sorting and
Searching, D.E. Knuth, 1968, pp 471-480.
Dynamic Hash Tables, Per-Ake Larson, Communications of the
ACM, April 1988.
A New Hash Package for UNIX, Margo Seltzer, USENIX Pro-
ceedings, Winter 1991.
Document Processing in a Relational Database System,
Michael Stonebraker, Heidi Stettner, Joseph Kalash,
Antonin Guttman, Nadene Lynn, Memorandum No. UCB/ERL
M82/32, May 1982.
db_archive(1), db_checkpoint(1), db_deadlock(1), db_dump(1),
db_load(1), db_recover(1), db(3), db_appinit(3), db_cursor(3),
db_dbm(3), db_lock(3), db_log(3), db_mpool(3), db_open(3),
db_txn(3)