We cover basic installation and usage of PostgreSQL and MADlib on OSX and Ubuntu. Instructions for other environments should be similar. PostgreSQL is an open source database with enterprise functionalities which often lack in MySQL. MADlib is an open-source library which enhance a PostgreSQL or Greenplum database with functionalities for scalable in-database analytics.
First, we need to have a PostgreSQL installation. On OSX, we will use Homebrew instead of compiling the source by ourselves. On Ubuntu, apt-get will do it. Note, Homebrew is expected to be present on your machine. At the time of this writing, the current proposed version is 9.0.1. The following commands will install and configure a new PostgreSQL installation. Additionally, it will be registered as a startup service.
1 2 3 4 5 6
In case your having trouble on OSX lion with the following message:
You should edit “/etc/sysctl.conf” or create it if it doesn’t exist and write:
Make sure your
postgres command map to your new PostgreSQL installation, not to the default one. Executing
which postgres should print “/usr/local/bin/postgres”. If you see “/usr/bin/postgres” instead, you need to modify your path and place “/usr/local/bin” before “/usr/bin”. To do do, you may edit your “~/.bashrc”, “~/.profile” or “/etc/paths” file. Here’s an example:
1 2 3
Your PostgreSQL server should already be up and running. In the future, if you want to start/stop the server:
1 2 3 4
Note, the Homebrew formula configure the system with local “trust” authentication. Authentication is done against the current logged-in Unix user. Homebrew does not create a “postgres” user and the server is launch using the same user account under which you ran Homebrew.
You can test to connect to your new PostgreSQL server by issuing with
psql. For example, the command
psql -d postgres -c "\l" with open a connection to the “postgres” database (using your current username) and run the command
\l to list the databases.
Now, let’s please pgadmin by installing the Instrumentation functions not present by default.
1 2 3 4 5
We start by downloading the MADlib package and run the installation program.
MADlib comes with a complete set of test which you can also use as a usage reference or simply as a source of inspiration. The script are present in “/usr/local/madlib/ports/postgres/modules/*/test/*.sql”. To run the test suite:
MADlib: quick quantile example
We will test quantile function. For a simple definition, a quantile is the division of a total into equal subgroups.
As an example, the “.5” quantile of the serie ‘.25,.45,.5,.5,.55,.75’ is “.5”. If we substract “.1” to each element and use the same quantile, then the “.5” quantile of the serie ‘.15,.35,.4,.4,.45,.65’ is “.4”. We may interpret the result by saying that the serie is divided in two at “.4”. Finally, using the last serie, the “.75” quantile is “.425” meaning that one fourth of the values are after “.415”.
We will use the MADlib quantile test as a source of inspiration. It is located at “/usr/local/madlib/ports/postgres/modules/quantile/test/quantile.sql_in”.
Our simplified version will :
- Create a new table
- Insert some random values
- Compute the “.5” quantile
- Drop the table
1 2 3 4
random()*100 function will generate a uniform representation between 0 and 100, we expect a result relatively close to 50.