Augmenting PostgreSQL with AI using EvaDB

Published in

EvaDB Blog

4 min readSep 7, 2023

PostgreSQL + AI (Credits: Crunchydata) — Photo by Crunchy Data

In the fast-evolving landscape of AI-driven data analytics, the gap between database systems and AI platforms slows down software developers. Switching back and forth between the database system and the AI platform not only introduces delays but also disrupts the rhythm of software development. What if there was a way to seamlessly integrate AI right into your PostgreSQL database system? That’s where EvaDB comes in.

In this article, we illustrate how EvaDB seamlessly integrates AI into your PostgreSQL workflows for solving complex data manipulation tasks. In particular, we demonstrate how EvaDB enables AI-powered semantic join between tables that do not directly share a column that can be joined on.

Challenge: “AI-Powered” Join

Consider a scenario where you have two tables — one with details about AirBnB listings in San Francisco and the other providing insights into the city’s parks. Our objective is to identify Airbnb listings located in neighborhoods with a high concentration of nearby parks. These tables/datasets lack a common column for a straightforward join. The Airbnb dataset includes a ‘neighborhood’ column, while the parks dataset features a ‘zip code’ column.

“AI-Powered” Join using EvaDB

EvaDB steps in to address this challenge by facilitating the merging operation using language models. Below shows the key query to create a new reference table that can be joined with other tables easily.

CREATE TABLE reference_table AS
SELECT parkname, parktype, 
       ChatGPT(
       "Return the San Francisco neighborhood name when provided with a zipcode. The possible neighborhoods are: {neighbourhoods_str}. The response should an item from the provided list. Do not add any more words.",
       zipcode) 
FROM postgres_db.recreational_park_dataset;

Full Tutorial for using EvaDB

Next, we will walk through the complete process of achieving the workflow in EvaDB, and you can give it a try on Google Colab.

Connect to EvaDB

Let’s first establish a connection between EvaDB and an existing PostgreSQL database with a single EvaDB query:

--- Connect EvaDB to an existing PostgreSQL database
CREATE DATABASE postgres_db
WITH ENGINE = "postgres",
PARAMETERS = {
  "user": "gaurav",
  "password": "password",
  "database": "airbnb"
};

With this connection in place, we can leverage the power of EvaDB inside the PostgreSQL environment.

Loading the Datasets

Our next step involves loading the dataset CSVs into our PostgreSQL database:

--- Create the first table
USE postgres_db{
  CREATE TABLE airbnb_listing (
    id INT,
    name VARCHAR(100),
    summary VARCHAR(1000),
    neighbourhood VARCHAR(100)
  )
}

--- Create the second table
USE postgres_db {
  CREATE TABLE recreational_park_dataset (
    parkName VARCHAR(100),
    ParkType VARCHAR(100),
    Zipcode INT
  )
}

--- Load the first table
USE postgres_db {
  COPY airbnb_listing(id, name, summary, neighbourhood) 
  FROM 'listings.csv'
  DELIMITER ',' CSV HEADER
}

--- Load the second table
USE postgres_db {
  COPY recreational_park_dataset(parkName, ParkType, Zipcode) 
  FROM 'recreational_park.csv'
  DELIMITER ',' CSV HEADER
}

Select * FROM postgres_db.airbnb_listing LIMIT 5

Select * FROM postgres_db.recreational_park_dataset LIMIT 5

Our datasets are now loaded into two tables in the PostgreSQL database, setting us up for our AI-powered data analysis.

Use AI-Powered Queries in EvaDB

EvaDB bridges the semantic gap between the two tables using OpenAI’s LLM model. We create a new ‘reference’ table to map San Francisco neighborhoods to their corresponding zip codes. First, we retrieve all distinct neighborhoods from our Airbnb listings, and then we use LLM to associate a zip code with each neighborhood. We use the following prompt:

“Return the San Francisco neighborhood name when provided with a zipcode. The possible neighborhoods are: {neighborhoods_str}. The response should be an item from the provided list. Do not add any additional words.”

“neighborhoods_str” is the aggregated distinct neighborhoods in the airbnb_listings. This table is generated using the following EvaDB query that calls ChatGPT.

--- Get a list of neighborhoods
SELECT DISTINCT(neighborhood) FROM airbnb_listing;

--- Create a reference table that maps neighborhoods to zipcodes using ChatGPT
CREATE TABLE reference_table AS
SELECT parkname, parktype, 
       ChatGPT(
       "Return the San Francisco neighborhood name when provided with a zipcode. The possible neighborhoods are: {neighbourhoods_str}. The response should an item from the provided list. Do not add any more words.",
       zipcode) 
FROM postgres_db.recreational_park_dataset;

Deriving insights using EvaDB

With our reference table prepared, it’s time for the crucial step. We will now combine the reference table with the Airbnb listings to gain valuable insights into properties located nearest to the city’s parks.

--- Map Airbnb listings to park
SELECT airbnb_listing.neighbourhood
FROM postgres_db.airbnb_listing
  JOIN reference_table ON airbnb_listing.neighbourhood = reference_table.response;

This aggregated table opens the door to various analytical possibilities. Below, we present a chart illustrating the number of parks in each neighborhood. This data can inform decisions when recommending potential Airbnb locations for individuals who enjoy staying in proximity to recreational parks.

Conclusion

In this article, we started exploring how EvaDB integrates AI capabilities into PostgreSQL and helps solve challenging data analytics tasks. We will highlight additional features of EvaDB thanks to AI-centric query optimization in the coming weeks. If you’re excited about our vision of bringing AI inside database systems, show some ❤️ by:

🐙 giving a ⭐ on our EvaDB repo on Github
📟 joining our Slack Community
🐦 following us on Twitter
📝 following us on Medium

Thanks for your time! 😊