Skip to main content

How to Install pgvector PostgreSQL extension (Mac, Docker)

In this article, I'm going to show you how to install the pgvector PostgreSQL extension on a Mac using Docker.

PostgreSQL is a powerful open-source relational database management system with robust features for handling large datasets and supporting complex queries.

pgvector

The pgvector extension is an open-source vector similarity search add-on for Postgres. With the extension, you can perform:

  • Exact and approximate nearest neighbor search
  • L2 distance, inner product, and cosine distance

The pgvector extension is usually used to interact with an LLM (Large Language Model). But this article is simply focused on getting the extension installed locally and working first.

Docker

Docker, a popular containerization platform, allows you to run applications in isolated environments with ease. In this step-by-step guide, we will walk you through the process of installing PostgreSQL on a Mac using Docker.

Prerequisites

Before you begin, ensure that you have Docker installed on your Mac. You can download and install Docker Desktop from the official Docker website (https://www.docker.com/products/docker-desktop).

You must launch the app to have the docker command work from a terminal.

Step 1. Launch terminal

Open the Terminal application on your Mac. You can find it by searching for "Terminal" in Spotlight or navigating to Applications -> Utilities -> Terminal.

Step 2. Pull pgvector PostgreSQL Docker image

In the Terminal, execute the following command to pull the pgvector PostgreSQL Docker image from Docker Hub:

docker pull pgvector/pgvector:pg16 

This command downloads the pgvector PostgreSQL extension image to your local machine.

To verify that it was pulled down successfully, run this command:

docker images

Step 3. Create a Docker volume

We will create a Docker volume to ensure the persistence of PostgreSQL data. Execute the following command in the Terminal:

docker volume create pgvector-data

To verify that the volume was created, run this command:

docker volume ls

Step 4. Create a pgvector PostgreSQL container

Once the image is downloaded, create a PostgreSQL container using the following command. Replace <password> with your desired password for the PostgreSQL database.:

docker run --name pgvector-container -e POSTGRES_PASSWORD=<password> \
-p 5432:5432 -v pgvector-data:/var/lib/postgresql/data \
-d pgvector/pgvector:pg16

This command creates a Docker container named "pgvector-container" with the specified password, exposes port 5432 (the default port for PostgreSQL), and mounts the "pgvector-data" volume to the container's data directory.

Step 5. Verify the pgvector PostgreSQL container exists

To check if the container is running, execute this command:

docker ps

This command will display a list of running containers. Look for pgvector-container in the list.

Step 6. Access PostgreSQL

You can now access PostgreSQL using a client application or through the command line. To connect using the psql command-line tool, execute the following command:

 docker exec -it pgvector-container psql -U postgres

This command connects to the PostgreSQL database running in the container and opens the psql command-line interface.

Step 7. Setup the pgvector extension in PostgreSQL

Once connected to the psql command-line interface, you can execute SQL commands and interact with the PostgreSQL database. For example, you can create a new database like this (do not forget the semicolon at the end!):

CREATE DATABASE mydatabase;

To verify that was created, run this command inside psql:

\list

Step 8. Activate the extension for the database

From within the psql command line, run these commands:

  • Connnect to the database that you created previously (using \c):
\c mydatabase
  • Activate the vector extension for the database:
CREATE EXTENSION vector;
  • Confirm that the extension was added:
\dx

Step 9. Create a place to store vectors

  • Create what the extension calls a "vector column" as a new table
  • In this case the vector has 3 dimensions:
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
  • Confirm the table was added:
\dt

Step 10. Insert random vector data

For this article I'm just using random vector data.

  • Insert random vector data (match the vector length - in this case 3):
INSERT INTO items (embedding) VALUES ('[0.01,0.02,0.03]'), ('[0.04,0.05,0.06]');
INSERT INTO items (embedding) VALUES ('[0.07,0.08,0.09]'), ('[0.10,0.11,0.12]');
  • Perform a similarity search:

This is also just random vector data:

SELECT embedding, 1 - (embedding <=> '[0.07,0.07,0.07]') AS cosine_similarity FROM items;

Direct match:

SELECT embedding, 1 - (embedding <=> '[0.07,0.08,0.09]') AS cosine_similarity FROM items;

Consult the instructions on pgvector for more info (link in the references section below).

Step 11. Quit out of psql and the extension

To exit psql, run this command:

quit

Cleanup

If you don't mind losing anything you've setup, you can remove the container, image, data, etc. with the following commands:

To remove the container:

docker stop pgvector-container
docker rm pgvector-container

To remove the image:

docker rmi pgvector/pgvector:pg16

To remove the volume

tip

Only do this if you don't mind deleting all of your databases, etc.

docker volume rm pgvector-data

Conclusion

Congratulations! You have installed PostgreSQL along with the pgvector extension on your Mac using Docker. By utilizing a Docker volume, your PostgreSQL data will be stored persistently even if the container is stopped or restarted. This allows for data durability and easier management of your database. Remember to stop and remove the container when you're done to conserve system resources.

References