Using Teradata sucks -- there, I said it. It's painful to
convene the data committee to deal with adding a column to the data
warehouse, then having to do a cost/benefit analysis for every meager
gigabyte of data you want to store because the thing costs more than my
house.
On the other hand, what are your alternatives? I don’t mean
Netezza or Exadata, which are different flavors of the same pain. I mean
something different: In the brave new world of Hadoop, you have
Hive, which is slow, and Impala, which doesn't scale well. They're not viable data warehouse replacements much of the time.
However, SonarW could be. It promises to keep the “shall we
add this column” committee at bay. It comes from JSonar, the company
that MongoDB users know for JSONStudio. It is built from the ground up
on JSON and is compatible with MongoDB; anything that talks to MongoDB
talks to SonarW.
But there's even more to SonarW. Like Hive or Impala, SonarW
can use HDFS, the Hadoop distributed file system, to scale. And SonarW
should perform far better than Hive and Impala.
For architecture and speed, SonarW is a data warehouse
similar to massively parallel processing (MPP) data warehouses.
According to the demo I attended, it ran fast on one machine, and the
company claims it runs even better on many more. If you’ve had any
experience with Hadoop, you know this scheduling issue is a pain point.
Giving me 200 rows from one table takes far too long, and many of your
workloads even for a big data project are not that large, especially
when setting up the major part of the job.
In other words, Hadoop always tries to maximize resource
utilization. But sometimes you need to go grab something real quick and
you don’t need 100 nodes to do it.
SonarW can of course connect standard business-intelligence
tools, but you’ll lose some of the advantages of MongoDB’s aggregation
framework and pipelining. At the same time, people who use data
warehouse are typically not familiar with JSON tools and MongoDB’s
aggregation framework.
That gap between the data warehouse world and the
MongoDB/JSON world is the key challenge for SonarW. The company's answer
to this challenge is SQL compatibility via a plug-in to MariaDB’s
MaxScale. That lets you connect to SonarW all your favorite SQL tools
that connect to MySQL or MariaDB (which includes anything ODBC or JDBC).
SonarW is hardly the only provider looking to bridge data
warehousing and Hadoop via SQL or OLAP, such as AtScale. My inbox is
full of such announcements, with many claiming to be the first to do so
(they are not).
Even the Mongo analytics field is starting to be a
thing. Which begs the question: Are there enough paying customers for
MongoDB who will use it for analytics to support SonarW’s offering?
What could work to SonarW's advantage is its simplicity and
lower cost (starting at $15,000 per terabyte) compared to traditional
data warehouses and MPP systems. That might motivate even
non-MongoDB-oriented companies to at least kick the tires.
However, I suspect that those who are on Teradata are stuck
on Teradata. Moving from an entrenched technology means retraining staff
and paying for migration -- it's usually easier to keep paying your
dealer than go to rehab.
Even so, maybe there is room for a rebel base in these
organizations, an alliance between the NoSQL team and analysts who are
willing to learn something new and managers who don’t want to blow their
budget on a few more bytes in Teradata, Netezza, or Exadata.
This story, "A better mousetrap: A JSON data warehouse takes on Hadoop" was originally published by
InfoWorld.
Comments
Post a Comment