CHATDB FOR LARGE-SCALE ENTERTAINMENT DATASETS
Final capstone project for Foundations of Management Course (DSCI 551) at USC Viterbi. Full-stack AI chatbot enabling natural language querying and modification of 20M+ records across Postgres, MySQL, and MongoDB. Built with Python, Spark, LangChain, and GPT-3.5; implemented secure APIs with authentication and access control for production-like environments.
To access our full dataset, please visit this Drive Link. The original dataset can be found on Kaggle
pip install -r requirements.txt
brew install mysql
brew services start mysql
pip install streamlit
pip install langchain, langchain_community, langchain_openai
pip install pymysql
- login to MySQL
mysql -u root -p
- Enter password
- create database
CREATE DATABASE project551;
- Use created database
USE project551;
- Upload dataset
source dataset.sql
- login to PostGreSQL
psql -U postgres
- create database
CREATE DATABASE project551;
- connect to created database
\c project551
- Upload dataset
\i dataset_psql.sql
- Place your JSON file
Save the JSON file you want to upload to MongoDB in our CHATDB-FOR-LARGE-SCALE-ENTERTAINMENT-DATASETS folder
- Edit the script with your MongoDB credentials
In the mongo_db.py file, find the init_database function and call it with your MongoDB username, password, and appName
- Update the file path and collection name
In upload_data_to_mongo function, update the file path to match the location of your JSON file -- as well as the collection name
- Run the Upload
Call the upload_data_to_mongo function to upload collections to the MongoDB database
streamlit run app.py
streamlit run mongo_db.py
|--requirements.txt
|--README.md
|--code/
|--app.py # Used to create NLI and generate queries for MySQL/PostgreSQL
|--mongo_db.py # Used to create NLI and generate queries for MongoDB
|--dataset.sql # Used to upload data to MySQL
|--dataset_psql.sql # Used to upload data to PostgreSQL
|--reports/
|--Draft- Group Proposal.pdf
|--Mid Progress Report.pdf
|--551_ Group Proposal Final.pdf
|--CHATDB_Final_Report.pdf
For privacy and security reasons, we have not attached the API keys used in the Github.