There are many instances where you want to access your home/office system that is exclusive to you and contains all your files at one place, from varied locations and different...
Big data abounds the digital age today. More and more people and organisations are now shifting to the cloud and embracing Big data like never before. Of course, it comes with its own risks, so tread carefully.
Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import.
Apache Giraph is an Apache project to perform graph processing on big data. Giraph utilises Apache Hadoop’s MapReduce implementation to process graphs.
Apache Hama is a distributed computing framework based on Bulk Synchronous Parallel computing techniques for massive scientific computations eg, matrix, graph and network algorithms.
Cloudera Impala is Cloudera’s open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google’s Dremel system which is available as an infrastructure service called Google BigQuery.
Neo4j is an open-source graph database, implemented in Java. The developers describe Neo4j as “embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables”.
Couchbase Server, originally known as Membase, is an open source, distributed (shared-nothing architecture) NoSQL document-oriented database that is optimised for interactive applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data.
SciDB is an array database designed for multidimensional data management and analytics common to scientific, geospatial, financial, and industrial applications.