Adaltas

Installing and using MADlib with PostgreSQL on OSX

We cover basic installation and usage of PostgreSQL and MADlib on OSX and Ubuntu. Instructions for other environments should be similar. PostgreSQL is an open source database with enterprise functionalities which often lack in MySQL. MADlib is an open-source library which enhance a PostgreSQL or Greenplum database with functionalities for scalable in-database analytics.

PostgreSQL: installation

First, we need to have a PostgreSQL installation. On OSX, we will use Homebrew instead of compiling the source by ourselves. On Ubuntu, apt-get will do it. Note, Homebrew is expected to be present on your machine. At the time of this writing, the current proposed version is 9.0.1. The following commands will install and configure a new PostgreSQL installation. Additionally, it will be registered as a startup service.

OSX:

1
2
3
4
5
6
brew update
brew install postgres
initdb /usr/local/var/postgres
mkdir -p ~/Library/LaunchAgents
cp /usr/local/Cellar/postgresql/9.1.3/homebrew.mxcl.postgresql.plist ~/Library/LaunchAgents/
launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist

In case your having trouble on OSX lion with the following message:

1
2
FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=1, size=1646592, 03600).

You should edit “/etc/sysctl.conf” or create it if it doesn’t exist and write:

1
2
kern.sysv.shmall=65536
kern.sysv.shmmax=16777216

Unbuntu:

1
2
apt-get update
apt-get install postgres

Make sure your postgres command map to your new PostgreSQL installation, not to the default one. Executing which postgres should print “/usr/local/bin/postgres”. If you see “/usr/bin/postgres” instead, you need to modify your path and place “/usr/local/bin” before “/usr/bin”. To do do, you may edit your “~/.bashrc”, “~/.profile” or “/etc/paths” file. Here’s an example:

1
2
3
echo "export PATH=/usr/local/bin:\$PATH" >> ~/.bashr
. ~/.profile
which postgres

Your PostgreSQL server should already be up and running. In the future, if you want to start/stop the server:

1
2
3
4
# Start
pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
# Stop
pg_ctl -D /usr/local/var/postgres stop -s -m fast

PostgreSQL: running

Note, the Homebrew formula configure the system with local “trust” authentication. Authentication is done against the current logged-in Unix user. Homebrew does not create a “postgres” user and the server is launch using the same user account under which you ran Homebrew.

You can test to connect to your new PostgreSQL server by issuing with psql. For example, the command psql -d postgres -c "\l" with open a connection to the “postgres” database (using your current username) and run the command \l to list the databases.

Now, let’s please pgadmin by installing the Instrumentation functions not present by default.

1
2
3
4
5
psql -d postgres -c "CREATE EXTENSION "adminpack";"
# List installed extensions
psql -d postgres -c "select * from pg_extension"
# List available extensions
psql -d postgres -c "select * from pg_available_extensions"

That’s pretty much with PostgreSQL, now let’s move on with installing the MADlib library.

MADlib: installation

We start by downloading the MADlib package and run the installation program.

Once installed, MADlib need to be registered against a PostgreSQL database. We start by creating a database “mad” and then we register it:

1
2
psql -d postgres -c "CREATE DATABASE mad"
/usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/mad install

MADlib: testing

MADlib comes with a complete set of test which you can also use as a usage reference or simply as a source of inspiration. The script are present in “/usr/local/madlib/ports/postgres/modules/*/test/*.sql”. To run the test suite:

1
/usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/mad install-check

MADlib: quick quantile example

We will test quantile function. For a simple definition, a quantile is the division of a total into equal subgroups.

As an example, the “.5” quantile of the serie ‘.25,.45,.5,.5,.55,.75’ is “.5”. If we substract “.1” to each element and use the same quantile, then the “.5” quantile of the serie ‘.15,.35,.4,.4,.45,.65’ is “.4”. We may interpret the result by saying that the serie is divided in two at “.4”. Finally, using the last serie, the “.75” quantile is “.425” meaning that one fourth of the values are after “.415”.

We will use the MADlib quantile test as a source of inspiration. It is located at “/usr/local/madlib/ports/postgres/modules/quantile/test/quantile.sql_in”.

Our simplified version will :

  1. Create a new table
  2. Insert some random values
  3. Compute the “.5” quantile
  4. Drop the table
1
2
3
4
CREATE TABLE TestQuantile ( val FLOAT );
INSERT INTO TestQuantile SELECT random()*100 FROM generate_series(1,1000);
SELECT MADLIB.quantile('TestQuantile', 'val', 0.5);
DROP TABLE TestQuantile;

Since our random()*100 function will generate a uniform representation between 0 and 100, we expect a result relatively close to 50.

Comments