SQL o NoSQL technologies such as Hadoop or Cassandra. We do use some less-than-conventional storage technologies such as CouchDB and Redis.
A strong recommendation is that you master the fundamentals and prove out your thesis in a slightly less complex environment first before migrating to an inherently more complex dis- tributed system—and then be ready to make major adjustments to your algorithms to make them performant once data access is no longer local. A good option to investigate if you want to go this route is Dumbo. Stay tuned to this book’s Twitter account (@SocialWebMining) for extended examples that involve Dumbo.
MySQL, NoSQL, Hadoop or Cassandra, CouchDB and Redis
- It does not use SQL as its query language
- NoSQL database systems rose alongside major internet companies, such as Google, Amazon, and Facebook, which had significantly different challenges in dealing with huge quantities of data that the traditional RDBMS solutions could not cope with. NoSQL database systems are developed to manage large volumes of data that do not necessarily follow a fixed schema. Data is partitioned among different machines (for performance reasons and size limitations) so JOIN operations are not usable and ACID guarantees are not given.
- It may not give full ACID guarantees
- Usually only eventual consistency is guaranteed or transactions limited to single data items. This means that given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system.
- It has a distributed, fault-tolerant architecture
- Several NoSQL systems employ a distributed architecture, with the data held in a redundant manner on several servers. In this way, the system can easily scale out by adding more servers, and failure of a server can be tolerated. This type of database typically scales horizontally and is used for managing big amounts of data, when the performance and real-time nature is more important than consistency (as indexing a large number of documents, serving pages on high-traffic websites, and delivering streaming media).
NoSQL database systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run time flexibility compared to full SQL systems is compensated by significant gains in scalability and performance for certain data models.
In short, NoSQL database management systems are useful when working with a huge quantity of data and the data’s nature does not require a relational model for the data structure. The data could be structured, but it is of minimal importance and what really matters is the ability to store and retrieve great quantities of data, and not the relationships between the elements. For example, to store millions of key-value pairs in one or a few associative arrays or to store millions of data records. This is particularly useful for statistical or real-time analyses for growing list of elements (such as Twitter posts or the Internet server logs from a big group of users).
Apache Software Foundation
Última versión estable 1.0.0
27 de diciembre de 2011; hace 5 meses
Género Sistema de archivos distribuido
Programado en Java
Sistema operativo Multiplataforma
Licencia Apache License 2.0
Estado actual Activo
Apache Hadoop es un framework de software que soporta aplicaciones distribuidas bajo una licencia libre.1 Permite a las aplicaciones trabajar con miles de nodos y petabytes de datos. Hadoop se inspiró en los documentos Google para MapReduce y Google File System (GFS).
Hadoop es un proyecto de alto nivel Apache que está siendo construido y usado por una comunidad global de contribuidores,2 mediante el lenguaje de programación Java. Yahoo! ha sido el mayor contribuidor al proyecto,3 y usa Hadoop extensivamente en su negocio.4
Welcome to Apache Cassandra
Redis es un motor de base de datos en memoria, basado en el almacenamiento en tablas de hashes(llave, valor) pero que opcionalmente puede ser usada como una base de datos durable o persistente. Está escrito en ANSI C por Salvatore Sanfilippo quien es patrocinado por VMware.1 2 y esta liberado bajo licencia BSD por lo que es considerado software de código abierto.
Meebo, for their social platform (web and applications)