After announcing the technology at its OpenWorld conference last month, Oracle Corp. (NASDAQ: ORCL) has launched its much anticipated NoSQL database.
The Oracle NoSQL Database can now be downloaded from the Oracle Technology Network. The software will also be a key component the Oracle Big Data Appliance, due to be shipped in the first three months of 2012.
The database, based on the Berkeley DB database, would be of interest to “customers who are acquiring massive amounts of data who are unsure about the schema, who want more fluid capture of the data,” said Marie-Anne Neimat, vice president of database development at Oracle.
The company is responding to the growing number of databases introduced in the past few years that eschew the typical SQL architecture in order to scale up and speed performance.
Such a database may be useful for storing information such as data from service logs, sensors and meters as well as from social networks and personal information for e-commerce sites, the company claimed. A database of this sort would also be a good fit for large organizations that are already using Oracle databases, noted analyst Curt Monash of Monash Research. In many cases a relational database is not the best choice for tasks such as tracking Web interactions. “NoSQL in general deserves a place in Oracle shops, so it makes sense for Oracle to try to co-opt it,” he wrote in a blog posting.
NoSQL can also be used to handle non-essential data storage duties, taking some of the burden off of more structured relational databases. Monash pointed to a recent database failure experienced by JPMorgan Chase. Because the company stored both time-sensitive financial transactional data and non-essential user information on the same database, financial transactions were slowed by an influx of large number of users checking into the Web site after a crash. Keeping the user data on a separate, possibly NoSQL, database for the user data might have eliminated this problem.
The database is based the Java version of Berkeley DB, an open source database developed by the University of California Berkeley that is widely used in embedded systems. The database uses a simple key-value data model, meaning that a program can fetch the needed piece of data by providing the appropriate key, or a numeric identifier.
Although it does not offer the ability to do nuanced, highly structured queries in the same way a SQL relational database would, the database doesn’t require a fixed underlying schema, so organizations can add new columns as new types of information need to be captured, Neimat said.
The software allows administrators to vary the speed of responsiveness against the time needed to reach consistency, or the state when a piece of data is completely stored.
“When an update is issued, it can be applied to a single node or the majority of nodes, or to all of them. That makes it easy for the user to manage consistency,” Neimat said. The database will be able to scale at a near linear rate, meaning capacity can be increased in a uniform rate as more servers are added to the cluster. Oracle itself has built a 300 node cluster with this database, though, theoretically, there is no limit to the size of the cluster that could be built, Neimat said.
Keeping track of the location of all the data falls to a client library, which can be linked to by an application. The Java-based library routes requests to the node holding the copy of the data. Programmers have their applications interact with the database through a Java API (application programming interface).
Primary keys themselves can have sub-keys, which point to different fields within the same record. Subkeys can be advantageous in that they could be used to add more data fields to existing records. “You can have flexibility in which attributes to have with which records. You’re not sure what you want to do with the data, but you do know you want to keep it and analyze it later,” Neimat said.
“All the records that share the same root key are all on the same partition, all on the same node,” Neimat said. “You can update multiple records, insert, retrieve, delete multiple records using the primary key.” Administrators can interact with the database through a Web console, which offers the ability to manage and monitor topology, as well as to set up load balancing across multiple nodes.
The company will offer a free community version of the database, as well as a commercial version that will eventually be augmented with additional features. The company is promising that the installation will be polished to the degree that one would expect from Oracle, and that the company will offer full support for the paid editions.