Anvil

GPU-accelerated analytics

Background

Graphical processing units were originally intended to render complex images, thereby offloading work from CPUs, but they are essentially devices that excel at performing rapid mathematical calculations in parallel. Compared to CPUs, GPUs have many times more cores and much greater memory bandwidth.

GPU databases are in-memory column-store relational databases that can be defined and queried using SQL. Features vary by vendor, but notable ones include:

  • query prerformance orders of magnitude faster than CPU-based solutions
  • a single server can support a database of serveral terabytes, or up to 100TB according to SQream's CEO
  • linear scale-out across clusters
  • no need for indexes
  • very fast data ingestion rates
  • a visualisation tool that generates content server-side based on huge datasets
  • can be queried using conventional end-user tools such as Tableau.

Use Cases

Not for transactional systems, but for analysing non-streaming structured datasets. In those circumstances, these databases seem to provide an alternative to:

  1. smaller Hadoop clusters using Hive/Impala/Presto/Spark SQL
  2. Amazon Redshift, Exadata Service, Azure SQL Data Warehouse
  3. SAP HANA, Oracle Exadata, SQL Server PDW
  4. conventional BI systems such SQL Server + Analysis Services (SSAS)

GPU databases can be much cheaper and less complex for terbyte-sized datasets than the first three. Also Hadoop was never designed for relational data analytics, for all the products that have been developed to mitigate that.

Point (4) does not attract much attention, but I think is valid. A small SSAS-centric system of a few hundred gigabytes involves significant complexity involving partitioning, indexing, aggregations and keeping the analytical database synchronised with the data warehouse. A GPU database is much simpler and more performant.

Vendors

The most prominent GPU database vendors are:

On the hardware side, Nvidia dominates the market and provides much more information here. GPU servers are quite a recent phenomenon, but their availability in the Cloud may speed adoption: for example via Amazon's P2 instances. HP and Dell are also increasing their offerings for on-premises solutions.