How to Install pgvector PostgreSQL extension (Mac, Docker)
In this article, I'm going to show you how to install the pgvector PostgreSQL extension on a Mac using Docker.
PostgreSQL is a powerful open-source relational database management system with robust features for handling large datasets and supporting complex queries.
pgvector
The pgvector extension is an open-source vector similarity search add-on for Postgres. With the extension, you can perform:
- Exact and approximate nearest neighbor search
- L2 distance, inner product, and cosine distance
The pgvector extension is usually used to interact with an LLM (Large Language Model). But this article is simply focused on getting the extension installed locally and working first.
Docker
Docker, a popular containerization platform, allows you to run applications in isolated environments with ease. In this step-by-step guide, we will walk you through the process of installing PostgreSQL on a Mac using Docker.
Prerequisites
Before you begin, ensure that you have Docker installed on your Mac. You can download and install Docker Desktop from the official Docker website (https://www.docker.com/products/docker-desktop).
You must launch the app to have the docker command work from a terminal.
Step 1. Launch terminal
Open the Terminal application on your Mac. You can find it by searching for "Terminal" in Spotlight or navigating to Applications -> Utilities -> Terminal.
Step 2. Pull pgvector PostgreSQL Docker image
In the Terminal, execute the following command to pull the pgvector PostgreSQL Docker image from Docker Hub:
docker pull pgvector/pgvector:pg16
This command downloads the pgvector PostgreSQL extension image to your local machine.
To verify that it was pulled down successfully, run this command:
docker images
Step 3. Create a Docker volume
We will create a Docker volume to ensure the persistence of PostgreSQL data. Execute the following command in the Terminal:
docker volume create pgvector-data
To verify that the volume was created, run this command:
docker volume ls
Step 4. Create a pgvector PostgreSQL container
Once the image is downloaded, create a PostgreSQL container using the following command. Replace <password>
with your desired password for the PostgreSQL database.:
docker run --name pgvector-container -e POSTGRES_PASSWORD=<password> \
-p 5432:5432 -v pgvector-data:/var/lib/postgresql/data \
-d pgvector/pgvector:pg16
This command creates a Docker container named "pgvector-container" with the specified password, exposes port 5432 (the default port for PostgreSQL), and mounts the "pgvector-data" volume to the container's data directory.
Step 5. Verify the pgvector PostgreSQL container exists
To check if the container is running, execute this command:
docker ps
This command will display a list of running containers. Look for pgvector-container in the list.
Step 6. Access PostgreSQL
You can now access PostgreSQL using a client application or through the command line. To connect using the psql command-line tool, execute the following command:
docker exec -it pgvector-container psql -U postgres
This command connects to the PostgreSQL database running in the container and opens the psql command-line interface.
Step 7. Setup the pgvector extension in PostgreSQL
Once connected to the psql command-line interface, you can execute SQL commands and interact with the PostgreSQL database. For example, you can create a new database like this (do not forget the semicolon at the end!):
CREATE DATABASE mydatabase;
To verify that was created, run this command inside psql:
\list
Step 8. Activate the extension for the database
From within the psql command line, run these commands:
- Connnect to the database that you created previously (using \c):
\c mydatabase
- Activate the vector extension for the database:
CREATE EXTENSION vector;
- Confirm that the extension was added:
\dx
Step 9. Create a place to store vectors
- Create what the extension calls a "vector column" as a new table
- In this case the vector has 3 dimensions:
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
- Confirm the table was added:
\dt
Step 10. Insert random vector data
For this article I'm just using random vector data.
- Insert random vector data (match the vector length - in this case 3):
INSERT INTO items (embedding) VALUES ('[0.01,0.02,0.03]'), ('[0.04,0.05,0.06]');
INSERT INTO items (embedding) VALUES ('[0.07,0.08,0.09]'), ('[0.10,0.11,0.12]');
- Perform a similarity search:
This is also just random vector data:
SELECT embedding, 1 - (embedding <=> '[0.07,0.07,0.07]') AS cosine_similarity FROM items;
Direct match:
SELECT embedding, 1 - (embedding <=> '[0.07,0.08,0.09]') AS cosine_similarity FROM items;
Consult the instructions on pgvector for more info (link in the references section below).
Step 11. Quit out of psql and the extension
To exit psql, run this command:
quit
Cleanup
If you don't mind losing anything you've setup, you can remove the container, image, data, etc. with the following commands:
To remove the container:
docker stop pgvector-container
docker rm pgvector-container
To remove the image:
docker rmi pgvector/pgvector:pg16
To remove the volume
Only do this if you don't mind deleting all of your databases, etc.
docker volume rm pgvector-data
Conclusion
Congratulations! You have installed PostgreSQL along with the pgvector extension on your Mac using Docker. By utilizing a Docker volume, your PostgreSQL data will be stored persistently even if the container is stopped or restarted. This allows for data durability and easier management of your database. Remember to stop and remove the container when you're done to conserve system resources.