Great list of resources: data science, visualization, machine learning, big data

Great list of resources: data science, visualization, machine learning, big data

Fantastic resource created by Andrea Motosi. I’ve only included the 5 categories that are the most relevant to our audience, though it has 31 categories total, including a few on distributed systems and Hadoop. Click here to view the 31 categories. You might also want to check our our our internal resources (the first section below).

Source: Machine Learning and Face Recognition Papers
Data Science Central – Resources
Machine Learning
Apache Mahout: machine learning library for Hadoop
Ayasdi Core: tool for topological data analysis
brain: Neural networks in JavaScript
Cloudera Oryx: real-time large-scale machine learning
Concurrent Pattern: machine learning library for Cascading
convnetjs: Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser
Decider: Flexible and Extensible Machine Learning in Ruby
etcML: text classification with machine learning
Etsy Conjecture: scalable Machine Learning in Scalding
Google Sibyl: System for Large Scale Machine Learning at Google
H2O: statistical, machine learning and math runtime for Hadoop
IBM Watson: cognitive computing system
MLbase: distributed machine learning libraries for the BDAS stack
MLPNeuralNet: Fast multilayer perceptron neural network library for iOS and Mac OS X
nupic: Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms
PredictionIO: machine learning server buit on Hadoop, Mahout and Cascading
scikit-learn: scikit-learn: machine learning in Python
Spark MLlib: a Spark implementation of some common machine learning (ML) functionality
Sparkling Water: combine H2OÕs Machine Learning capabilities with the power of the Spark platform
Vahara: Machine learning and natural language processing with Apache Pig
Viv: global platform that enables developers to plug into and create an intelligent, conversational interface to anything
Vowpal Wabbit: learning system sponsored by Microsoft and Yahoo!
WEKA: suite of machine learning software
Wit: Natural Language for the Internet of Things
Wolfram Alpha: computational knowledge engine
Visualization
Arbor: graph visualization library using web workers and jQuery
CartoDB: open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API
Chart.js: open source HTML5 Charts visualizations
Crossfilter: avaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js
Cubism: JavaScript library for time series visualization
Cytoscape: JavaScript library for visualizing complex networks
D3: javaScript library for manipulating documents
DC.js: Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3
Envisionjs: dynamic HTML5 visualization
Freeboard: pen source real-time dashboard builder for IOT and other web mashups
Gephi: An award-winning open-source platform for visualizing and manipulating large graphs and network connections
Google Charts: simple charting API
Grafana: graphite dashboard frontend, editor and graph composer
Graphite: scalable Realtime Graphing
Highcharts: simple and flexible charting API
IPython: provides a rich architecture for interactive computing
Keylines: toolkit for visualizing the networks in your data
Matplotlib: plotting with Python
NVD3: chart components for d3.js
Peity: Progressive SVG bar, line and pie charts
Plot.ly: Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly’s online spreadsheet. Fork others’ plots.
Recline: simple but powerful library for building data applications in pure Javascript and HTML
Redash: open-source platform to query and visualize data
Sigma.js: JavaScript library dedicated to graph drawing
Vega: a visualization grammar
Graph Databases
Apache Giraph: implementation of Pregel, based on Hadoop
Apache Spark Bagel: implementation of Pregel, part of Spark
ArangoDB: multi model distribuited database
Facebook TAO: TAO is the distributed data store that is widely used at facebook to store and serve the social graph
Faunus: Hadoop-based graph analytics engine for analyzing graphs represented across a multi-machine compute cluster
Google Cayley: open-source graph database
Google Pregel: graph processing framework
GraphLab PowerGraph: a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API
GraphX: resilient Distributed Graph System on Spark
Gremlin: graph traversal Language
InfiniteGraph: distributed graph database
Infovore: RDF-centric Map/Reduce framework
Intel GraphBuilder: tools to construct large-scale graphs on top of Hadoop
MapGraph: Massively Parallel Graph processing on GPUs
Neo4j: graph database writting entirely in Java
OrientDB: document and graph database
Phoebus: framework for large scale graph processing
Sparksee: scalable high-performance graph database
Titan: distributed graph database, built over Cassandra
Twitter FlockDB: distribuited graph database
NewSQL
Actian Ingres: commercially supported, open-source SQL relational database management system
BayesDB: statistic oriented SQL database
Cockroach: Scalable, Geo-Replicated, Transactional Datastore
Datomic: distributed database designed to enable scalable, flexible and intelligent applications
FoundationDB: distributed database, inspired by F1
Google F1: distributed SQL database built on Spanner
Google Spanner: globally distributed semi-relational database
H-Store: is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications
HandlerSocket: NoSQL plugin for MySQL/MariaDB
IBM DB2: object-relational database management system
InfiniSQL: infinity scalable RDBMS
MemSQL: in memory SQL database witho optimized columnar storage on flash
NuoDB: SQL/ACID compliant distributed database
Oracle Database: object-relational database management system
Oracle TimesTen in-Memory Database: in-memory, relational database management system with persistence and recoverability
Pivotal GemFire XD: Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS
SAP HANA: is an in-memory, column-oriented, relational database management system
SenseiDB: distributed, realtime, semi-structured database
Sky: database used for flexible, high performance analysis of behavioral data
SymmetricDS: open source software for both file and database synchronization
Teradata Database: complete relational database management system
VoltDB: in-memory NewSQL database
Other


Link: Great list of resources: data science, visualization, machine learning, big data