Abstract

This paper describes the architecture and design of Cubrick, a distributed multidimensional in-memory DBMS suited for interactive analytics over highly dynamic datasets. Cubrick has a strictly multidimensional data model composed of cubes, dimensions and metrics, supporting sub-second OLAP operations such as slice and dice, roll-up and drill-down over terabytes of data. All data stored in Cubrick is range partitioned by every dimension and stored within containers called bricks in an unordered and sparse fashion, providing high data ingestion rates and indexed access through any combination of dimensions. In this paper, we describe details about Cubrick’s internal data structures, distributed model, query execution engine and a few details about the current implementation. Finally, we present results from a thorough experimental evaluation that leveraged datasets and queries collected from a few internal Cubrick deployments at Facebook.

Granular Partitioning data layout. Illustration of how records are associated with partitions on the two-dimensional example shown in Table 1 and per dimension dictionary encodings. llustrates how Cubrick organizes the dataset shown in Table 1. Once that dataset is loaded, three bricks are created and inserted into brick map: 0, 2 and 3, containing 3, 1...  

  <img src=

Continue lendo em https://www.researchgate.net/publication/308340608_Cubrick_Indexing_Millions_of_Records_per_Second_for_Interactive_Analytics

Previous post Transmissão ao Vivo: Transmissão ao vivo.
Next post Loja ao Vivo: Radio Loja