A database is an information set with a regular structure. A database is usually but not necessarily stored in some machine-readable format accessed by a computer. There are a wide variety of databases, from simple tables stored in a single file to very large databases with many millions of records, stored in rooms full of disk drives.
Databases resembling modern versions were first developed in the 1960s. A pioneer in the field was Charles Bachman.
The most useful way of classifying databases is by the programming model associated with the database. Several models have been in wide use for some time. Historically, the hierarchical model was implemented first, then the network model, then the relational model overcame with the so-called flat model accompanying it for low-end usage. The first two and the last one were never theoretised and were deemed as data models only as a contrast to the relational model, not having conceptual underpinnings of their own; they have arisen simply out of the realisation of physical constraints and programming, not data, models.
The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password might be used as a part of a system security database. Each row would have the specific password associated with a specific user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This model is the basis of the spreadsheet.
The network model allows multiple tables to be used together through the use of pointers (or references). Some columns contain pointers to different tables instead of data. Thus, the tables are related by references, which can be viewed as a network structure. A particular subset of the network model, the hierarchical model, limits the relationships to a tree structure, instead of the more general directed graph structure implied by the full network model.
relations of n-tuples (tables of rows) of data elements (or attributes or columns); each n-tuple (row) is a colection of data elements (attributes/columns) of the entity represented by that particular n-tuple (row);
a collection of operators, the relational algebra and calculus;
and a collection of integrity constraints, defining the set of consistent database states and changes of state. The integrity constraints can be of four types: domain (AKA type), attribute, relvar and database constraints.
Unlike the hierarchical and network models, there are no explicit pointers whatsoever in the data held in the relational model. In the hierarchical and network models data is accessed by the programmer specifying an access path from pointer to pointer embedded in the data. In the relational model data is accessed using relational algebra. Subsets of n-tuples (rows) in different relations (tables) are joined in cross-products, they are intersected and they are differenced using the values of any of the attributes (columns). This flexibility in relational databases allow users (and programmers) to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for decades. This has made relational databases very popular with businesses.
Any number of declararative programming languages could be invented which would provide users with the means of specifying the relational algebra necessary to access and manipulate the data in relational databases. The de facto standard is Structured Query Language (SQL) although every RDBMS has its own dialect of this English-like declarative programming language.
The relational model is an implementation of the relational algebra and set theory branches of mathematics to the design and working of databases. Perhaps the most important pioneer in this field was Ted Codd. Although this model is the basis for relational database software management systems (RDBMS), very few RDBMS's implement the model entirely rigourously or completely and many have extra features which, if used, violate the theory. Some so called RDBMS's are not relational enough to be worthy of the term - they are DBMS's with relational features.
In recent years, the object-oriented paradigm has been applied to databases as well, creating a new programming model known as object databases. These databases attempt to overcome some of the difficulties of using objects with the SQL DBMSs. An object-oriented program allows objects of the same type to have different implementations and behave differently, so long as they have the same interface (polymorphism). This doesn't fit well with a SQL database where user-defined types are difficult to define and use, and where the Two Great Blunders prevail: the identification of classes with tables (the correct identification is of classes with types, and of objects with values), and the usage of pointers.
A variety of ways have been tried for storing objects in a database, but there is little consensus on how this should be done. Implementing object databases undo the benefits of relational model by introducing pointers and making ad-hoc queries more difficult. This is because they are essentially adaptations of obsolete network and hiearchical databases to object-oriented programming. As a result, object databases tend to be used for specialized applications and general-purpose object databases have not been very popular. Instead, objects are often stored in SQL databases using complicated mapping software. At the same time, SQL DBMS vendors have added features to allow objects to be stored more conveniently, drifting even further away from the relational model.
The term "database application" usually refers to software providing a user interface to a database. The software that actually manages the data is usually called a database management system (DBMS) or (if it is embedded) a database engine.
Atomicity - either all or no operations are completed. (Transactions that can't be finished must be completely undone.)
Consistency - all transactions must leave the database in consistent state.
Isolation - transactions can't interfere with each other's work and incomplete work isn't visible to other transactions.
Durability - successful transactions must persist through crashes.
In practice, many DBMS's allow some of these rules to be relaxed for better performance.
Concurrency control is a method used to ensure transactions are executed in a safe manner and follows the ACID rules. The DBMS must be able to ensure only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.
The Paleobiology Database A global, collection-based occurrence and taxonomic database for marine and terrestrial animals and plants of any geological age, as well as web-based software for statistical analysis of the data. http://paleodb.org/
Fossil Record 2 Family level searchable database of fossil organisms. Create your own diversity plots of different groups. http://palaeo.gly.bris.ac.uk/frwhole/FR2.html
BUGS A database of British Coleoptera (Arthropoda: Insecta), including information on habitat, distribution, and fossil occurrence, along with their bibliographic references. Database available for download in Microsoft Access format. http://www.ngdc.noaa.gov/paleo/insect.html
Neogene Marine Biota of Tropical America Online biotic database containing images and data for taxa used in analyses of Tropical American biodiversity over the past 25 million years. http://porites.geology.uiowa.edu/
PaleoBase On-Line Databases intended to provide authoritative references for common and stratigraphically important invertebrate macrofossils. http://www.paleobase.com/