Join Order Search
There was a join order search hook introduced in PostgreSQL version 8.3 which allows users to gain the control over the algorithm used to search for the most suitable join order during the planning phase for the query. We are working on implementing various heuristic algorithms to address join order problem and would like to perform comparisons of those algorithms. Based on the results and feedback from the community some of these algorithms could become a part of PostgreSQL core distribution.
Join Order Search Hook
There is a function type join_search_hook_type defined in paths.h reflecting the function prototype to be used as an entry to join order search algorithms. The pointer to the function to be called is stored in join_search_hook variable defined in paths.h as extern symbol and allpaths.c. If this variable is not NULL the specified function will be called at make_rel_from_joinlist, otherwise the standard algorithms would be used based on the settings (dynamic programming or genetic query optimizer).
To create a loadable module which will activate the probe and point it to the desired function you should do the following:
- Create the function to be called for search for the best join order.
- Create the PostgreSQL external module - using PG_MODULE_MAGIC macro and linking the binaries as a shared library.
- Assign the desired function to join_search_hook in module initialization function - void _PG_init(void).
- Assign NULL to join_search_hook in module finalization function - void _PG_fini(void).
If you wish to call the standard algorithm (dynamic programming only, not geqo!!!) for the join order search problem you can call standard_join_search function to do that. Here is the simple module that just calls the standard algorithm to do the job:
#include "postgres.h" #include "fmgr.h" #include "optimizer/geqo.h" #include "optimizer/paths.h" #include "optimizer/pathnode.h" PG_MODULE_MAGIC; void _PG_init(void); void _PG_fini(void); static RelOptInfo * dummy_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) { return standard_join_search(root, levels_needed, initial_rels); } void _PG_init(void) { join_search_hook = dummy_join_search; } void _PG_fini(void) { join_search_hook = 0; }
Goal
We are aiming to implement various different algorithms to solve the join order problem during the query planning in PostgreSQL. The standard dynamic programming optimizer works well for smaller number of relations, so we would like to focus on the cases where the number of relations is going to be too high for extensive search over the whole space and some heuristics have to be employed for the job.
PostgreSQL currently tries to avoid calling the genetic query optimizer (GeQO) by setting join_collapse_limit and from_collapse_limit parameters far from the GeQO execution threshold.
If the algorithms would be good enough we could increase these parameters and allow the planner to consider "all the available" options for join order.
Project
There is a related project created on pgfoundry.org to cover the implementation. You can download the code from the CVS. The name of the module you are looking for is jos.
Simply grab the source from CVS..
cvs -d :pserver:[email protected]:/cvsroot/optimizer login cvs -z3 -d :pserver:[email protected]:/cvsroot/optimizer checkout -P jos
and just run...
bash-3.00$ ./configure bash-3.00$ make --prefix=/path/to/pgsql/prefix bash-3.00$ make install
Now, all the module should be installed in the same directory as PostgreSQL libraries.
Modules
There are these modules
- dummy - dummy module just executing the standard_join_search function.
- compare - module executing various algorithms and printing out the plan cost for each.
- simgre - greedy algorithm with just depth of 1.
- greedy - greedy algorithm with the configurable depth.
To use the specified module load the specified library with
postgres=# LOAD 'josdummy.so'; LOAD
dummy
Just a dummy module that coudl be used as a template. It always executes the dynamic programming optimizer.
simgre
This is the simple greedy algorithm with the depth of one.
greedy
Disabled for now, implementation is not finished.
It is supposed to be a greedy algorithm with a configurable depth and/or lookahead in the future.
compare
Module that executes various available functions with the resulting values. Currently, it exeutes only standard optimizer and genetic query optimizer,
Algorithms
Greedy
The idea of greedy algorithm is very simple. In every step we have n relations that we would like to join. We would check all the possible combinations of relation pairs that we could perform a join on. We calculate the cost of joining every pair and we would pickup the cheapest one and make the corresponding join in this step. This would reduce the number of relations to be joined and now we have just n-1 relations. Finally, if the number of relations is reduced we can execute the standard dynamic programming optimizer.
We could enhance the algorithm and in every step we could try joining more relations (specified by the configuration parameter). We could even look for more relations and perform just the first few of them, e.g. we could try looking for a cheapest join of 5 relations but finally joining just 3 relations in one step.
Issuses/Concerns
- In some cases make_join_rel function might return false for the case where creating such a join is not possible. We have to make it clear whether the greedy approach might end in a situation with no further solution. If so, we have to make (find out ;-) an appropriate modifications.
ToDo
- Implement the greedy module.
- Dispose RelOptInfo structures that are not needed anymore.
- Implement other modules.
- Test the code.
Feedback
If you have any comments, questions or suggestions, please feel free to write to
- Julius Stroffek - julo at stroffek dot net; julius dot stroffek at sun dot com
- Tomas Kovarik - tkovarik at gmail dot com