The amount of data produced by scientic research and business is growing rapidly. Because of performance and especially cost advantages, more and more installed systems have a shared-nothing cluster architecture. This has an effect on the architecture of data processing systems. Due to the massive parallelism of the hardware, programming paradigms from high performance computing are translated into data processing. Database research struggles to keep up with this trend. A key feature of traditional database systems is to provide transparent access to the stored data, so that users do not have to regard and organize the storage location. This introduces data dependencies and increases system complexity and inter-process communication. Therefore, many developers are exchanging this feature for a better scalability. However, explicitly managing the data distribution and data fl ow requires a deep understanding of the distributed system and reduces the possibilities for automatic and autonomic optimization.
This talk presents a novel allocation algorithm that answers to the modern hardware and database system architecture. The allocation is computed automatically and is transparent to the user. The approach is query centric, it uses inter-query parallelism for scaling. The degree of parallelism can be adjusted by changing the grouping of queries. The algorithm optimizes the data distribution for local query execution and balances the workload according to the query history.