Skip to content

rachelhotpot/CHATDB-FOR-LARGE-SCALE-ENTERTAINMENT-DATASETS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CHATDB-FOR-LARGE-SCALE-ENTERTAINMENT-DATASETS

CHATDB FOR LARGE-SCALE ENTERTAINMENT DATASETS

Final capstone project for Foundations of Management Course (DSCI 551) at USC Viterbi. Full-stack AI chatbot enabling natural language querying and modification of 20M+ records across Postgres, MySQL, and MongoDB. Built with Python, Spark, LangChain, and GPT-3.5; implemented secure APIs with authentication and access control for production-like environments.

To access our full dataset, please visit this Drive Link. The original dataset can be found on Kaggle

To install all required libraries and packages

pip install -r requirements.txt

to install MySQL

brew install mysql

to start MySQL Service

brew services start mysql

to install Streamlit

pip install streamlit

to install langchain

pip install langchain, langchain_community, langchain_openai

to install pymysql

pip install pymysql

Upload to MySQL

  1. login to MySQL

mysql -u root -p

  1. Enter password
  2. create database

CREATE DATABASE project551;

  1. Use created database

USE project551;

  1. Upload dataset

source dataset.sql

Upload to PostGreSQL

  1. login to PostGreSQL

psql -U postgres

  1. create database

CREATE DATABASE project551;

  1. connect to created database

\c project551

  1. Upload dataset

\i dataset_psql.sql

Upload to MongoDB

  1. Place your JSON file

Save the JSON file you want to upload to MongoDB in our CHATDB-FOR-LARGE-SCALE-ENTERTAINMENT-DATASETS folder

  1. Edit the script with your MongoDB credentials

In the mongo_db.py file, find the init_database function and call it with your MongoDB username, password, and appName

  1. Update the file path and collection name

In upload_data_to_mongo function, update the file path to match the location of your JSON file -- as well as the collection name

  1. Run the Upload

Call the upload_data_to_mongo function to upload collections to the MongoDB database

Start streamlit to interact with our NLI real-time

streamlit run app.py

streamlit run mongo_db.py

Repository Structure

|--requirements.txt

|--README.md

|--code/

|--app.py # Used to create NLI and generate queries for MySQL/PostgreSQL
|--mongo_db.py  # Used to create NLI and generate queries for MongoDB
|--dataset.sql   # Used to upload data to MySQL
|--dataset_psql.sql  # Used to upload data to PostgreSQL

|--reports/

|--Draft- Group Proposal.pdf 
|--Mid Progress Report.pdf
|--551_ Group Proposal Final.pdf
|--CHATDB_Final_Report.pdf

OpenAI API Keys

For privacy and security reasons, we have not attached the API keys used in the Github.

However, we have added comments in the code indicating when to use your personal API key to replace the variable OPENAI_API_KEY.

About

Full-stack AI chatbot enabling natural language queries on 20M+ records (Postgres, MongoDB, LangChain, GPT-3.5)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages