By now most you will probably have heard of the term
NoSQL. It's a vague term that covers a lot of different types of database engines. The main classes of
NoSQL databases are
key/value stores,
column databases,
graph databases and
document databases. Examples of a key/value stores are
memcache or
Redis, where data can only be stored and retrieved through a specific key. Column databases, such as
Cassandra and
Hadoop, are great for processing large amounts of data. Graph databases such as
Neo4j and
OrientDB model the
relations between entities.
Apache CouchDB and
MongoDB belong to the last category, Document databases. We will be looking extensively at MongoDB in this article.
In a document database such as MongoDB the smallest unit is a
document. In MongoDB, documents are stored in a
collection, which in turn make up a
database.
Document are analogous to rows in a SQL table, but there is one big
difference: not every document needs to have the same structure—each of
them
can have different fields and that is a very useful
feature in many situations. Another feature of MongoDB is that fields in
a document can contain arrays and or sub-documents (sometimes called
nested or embedded documents).
MongoDB's Strengths
Supporting a
different set of fields for each document
in a collection is one of MongoDB's features. It allows you to store
similar data, but with different properties in the same collection. A
good example of this is storing real (not MongoDB) documents in a way
that is beneficial for a Content Management System (CMS). The CMS might
want to store articles, which have certain properties (e.g. author,
tags, and body), but also related books, which have additional
properties such as their ISBN number, but no
body field. An
article may need to store the periodical's ISSN number in lieu of an
ISBN number. In a relational database there are various ways to solve
this. Most frequently it is either solved by having a table per object
"class" (article or book) or coming up with a scheme that stores
object's properties in linked tables (for example through the
EAV pattern). In MongoDB you would simply store the article and book with the fields they need:
{
_id: ObjectId("51156a1e056d6f966f268f81"),
type: "Article",
author: "Derick Rethans",
title: "Introduction to Document Databases with MongoDB",
date: ISODate("2013-04-24T16:26:31.911Z"),
body: "This arti…"
},
{
_id: ObjectId("51156a1e056d6f966f268f82"),
type: "Book",
author: "Derick Rethans",
title: "php|architect's Guide to Date and Time Programming with PHP",
isbn: "978-0-9738621-5-7"
}
Even though the two documents represent different classes of
objects, you can still construct a query that looks for all the items by
an author, or for all the items with a specific title.
Data Model
Each document in a collection in MongoDB can look totally
different, and how you structure your documents is up to you. MongoDB
doesn't enforce a schema, but your application still should. Although
MongoDB is generally very fast, the way how you structure and index your
documents and collections has a big influence on the performance of
your application. While designing your schema you should focus more on
how the data is inserted, updated and queried and less on how the data
is structured. If sometimes you need to denormalise your data, then that
is a totally normal thing to do, even though it might look dirty at
first.
Interactions Between Collections
MongoDB makes different choices regarding functionality and
scaling than relational databases. MongoDB is very easy to scale through
replication and sharding, but it misses out on features like joins and
transactions because of this. Operations in MongoDB are only atomic per
single document, and only operate on one collection. Not allowing
operations between collections (joins) sounds like a real issue, but
with the
support of arrays and sub-documents this is actually in most cases not a problem. Let's have a look at the following example:
Take an application where we store image (meta) data and tags
that go with those images. In a relational database you would store that
in three different tables:
Images
id
|
filename
|
mimetype
|
size
|
1
|
cow.jpg
|
image/jpg
|
9123
|
2
|
bunny.png
|
image/png
|
8192
|
Tags
id
|
value
|
1
|
animal
|
2
|
cute
|
3
|
tasty
|
ImageTags
image_id
|
tag_id
|
1
|
1
|
1
|
3
|
2
|
1
|
2
|
2
|
And queries for both meta-data and the tags for the bunny (
id = 2
) are as follows:
SELECT *
FROM Images
WHERE id = 2
SELECT value
FROM ImageTags LEFT JOIN Tags ON (Tags.id = ImageTags.tag_id)
WHERE ImageTags.image_id = 2
This is quite complex as you can see. There are three tables,
and two queries involved. In MongoDB, you might store the same data as:
Images
{
_id: 1,
filename: 'cow.jpg',
mimetype: 'image/jpg',
size: 9123,
tags: [ 'animal', 'tasty' ]
},
{
_id: 2,
filename: 'bunny.png',
mimetype: 'image/png',
size: 8192,
tags: [ 'animal', 'cute' ]
}
To provide the same results as with the two SQL queries above, you would run in the
MongoDB shell:
db.Images.find( { _id: 2 } );
And on top of that, you have all the data right in one place ready for display.
Most examples for MongoDB will show your documents as
JSON
documents. This is not how MongoDB stores it internally, but it is a
good representation of how MongoDB deals with documents. For use within
PHP, you would convert
both objects and arrays to PHP arrays. The above can be translated to PHP like so:
$doc1 = array(
'_id' => 1,
'filename' => 'cow.jpg',
'mimetype' => 'image/jpg',
'size' => 9123,
'tags' => array( 'animal', 'tasty' )
},
Or if you use PHP 5.4 you can use the following:
$doc1 = [
'_id' => 1,
'filename' => 'cow.jpg',
'mimetype' => 'image/jpg',
'size' => 9123,
'tags' => [ 'animal', 'tasty' ]
],
PHP 5.4's short array syntax can come in quite handy when dealing with MongoDB documents with nested arrays and objects.
Getting Started
MongoDB can be downloaded for free from
http://mongodb.org/downloads. If you are on Debian or Ubuntu, I would greatly advice to follow the
specific instructions
with packages because they make updating easy. After downloading,
please make sure that MongoDB runs by running on the command line
mongo test
. This opens up a shell like interface for the
test database. If that works, then you can issue commands in JavaScript syntax such as:
db.persons.insert( { 'name': 'Derick Rethans', 'twitter': 'derickr' } );
db.persons.find( { 'twitter': 'derickr' } );
In order to use MongoDB from PHP, you also need to install the
PHP driver for MongoDB. In most situations you should be able to do so by running:
pecl install mongo
Please refer to the
PECL manual for further installation instructions.
Analogous to the previous example on the shell in PHP we would do:
test;
$col = $db->persons;
$col->insert( array( 'name' => 'Derick Rethans', 'twitter' => 'derickr' ) );
foreach ( $col->find( array( 'twitter' => 'derickr' ) ) as $record )
{
var_dump( $record );
}
The PHP documentation also includes a section for working with the MongoDB driver, including a
tutorial. A handy
cheat sheet gives you a quick overview on how to map SQL queries to the MongoDB query syntax.
Closing Words
MongoDB is not a straight replacement for your relational database. Questions such as
"How do I convert my relational database to MongoDB?"
make little sense as such a different approach is required to write
applications with MongoDB. That doesn't mean that MongoDB is not a
general purpose database—it can replace a relational database in almost
every situation. You just need to approach it differently, and when you
do so you should find working with MongoDB a breeze. Try it out, and
stay tuned for future articles!
London, UK
This article originally appeared in the June 2013 issue of
Web & PHP.
Comments
Post a Comment