Skip to content

panderior/anyang-project

Repository files navigation

Anyang project ReadMe

This project scrapes the arxiv site for papers, and then uses LLM (Qwen 2.5:0.5b model) to filter papers and show the user a word cloud and bar chart of extracted keywords. Additionally the top 5 papers are shown based on score they get.

Sub-projects

  • anyang-scrapy: includes the scrapy codebase to scrape the target site
  • anyang-flask: includes the flask backend to manage services and integrate the system components
  • anyang-llm: include initial code to interact with Qwen LLM.

Word Cloud visualization

Team Members:

  • Duguma Yeshitla
  • 천영학
  • 김현주
  • 이채림

Required software

  1. Apache Kafka
  2. Zookeeper
  3. Redis
  4. Celery
  5. Ollama
  6. Maria DB
  7. Docker

Supplementary Software

  1. Kafka UI
  2. Flower for Celery
  3. DBeaver

Deployment steps

Deploy Ollama

  1. Install Ollama on the device following the steps provided in the following link. https://ollama.com/download.
  2. The default Ollama instance running is not accessible from external IPs, so it has to be stopped. Here are the steps to achieve that.
    • Make sure the default ollama instance is not running. It will default to run in localhost with the port of 11434. This may block the new ollama instance you will be running.
       	sudo systemctl status ollama
       	sudo systemctl stop ollama
       	sudo systemctl disable ollama
      
    • Also make sure no other Ollama instance is running and if there is kill it.
       # check for existing ollama instance
       ps aux | grep ollama
       
       # stop existing ollama instances if there are any
       sudo kill -9 <process_id>
      
    • Run Ollama serve in the ubiquitous IP and PORT (i.e. accessible from anywhere)
       OLLAMA_HOST=0.0.0.0:11500 ollama serve	
      
    • Check if the model needed are available in the ubiquitous IP access (0.0.0.0) address for Ollama. If not pull it in this mode. Ollama serve command needs to be run for the following to work.
       # check the list of models
       OLLAMA_HOST=0.0.0.0:11500 ollama list
      
       # pull the model if it does not exist (eg: qwen3:0.6b)
       OLLAMA_HOST=0.0.0.0:11500 ollama pull <model:size>
       # for this project we need Qwen 2.5:0.5b model
       OLLAMA_HOST=0.0.0.0:11500 ollama pull qwen2.5:0.5b
      

Deploy Maria DB

  1. Make sure that docker is installed first. Or else follow the following link (for ubuntu): https://docs.docker.com/engine/install/ubuntu/
  2. Create the directory for the deployment files.
    mkdir mariadb
    cd mariadb
  3. Save the docker compose and environment variable files.
    • docker-compose.yml
    	services:
    	  mariadb:
    	    image: mariadb
    	    environment:
    	      MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
    	      MYSQL_USER: ${MYSQL_USER}
    	      MYSQL_PASSWORD: ${MYSQL_PASSWORD}
    	      MYSQL_DATABASE: ${MYSQL_DATABASE}
    	    ports:
    	      - "3376:3306"
    	    volumes:
    	      - "./data.sql:/docker-entrypoint-initdb.d/1.sql"
    • .env
    MYSQL_ROOT_PASSWORD = scrapy_root_pwd
    MYSQL_USER = scrapy_user
    MYSQL_PASSWORD = scrapy_user_pwd
    MYSQL_DATABASE = scrapy_db
  4. Deploy MariaDB docker instance. Based on the settings above, it will create all the database and user needed for the project.
    docker compose up -d

Deploy Apache Kafka

  1. Make sure Java JDK is installed.

    sudo apt update
    sudo apt install default-jdk
  2. Download & extract Kafka server instance from official site (https://kafka.apache.org/downloads).

    wget https://dlcdn.apache.org/kafka/3.9.0/kafka_{version}.tgz
    tar -xvf kafka_{version}.tgz
    mv kafka_{version} kafka
    # update folder permissions
    chmod 777 -R kafka
  3. Create data directory inside kafka one.

    cd kafka
    mkdir -p data/{zookeeper,kafka}
    chmod 777 -R data	
  4. Edit zookeeper and Kafka configuration files

    vim config/zookeeper.properties
    	dataDir=/home/user/kafka/data/zookeeper
    
    vim config/server.properties
    	log.dirs=/home/user/kafka/data/kafka
    	listeners=PLAINTEXT://<ip>:9092
    
  5. Run zookeeper and Kafka instances

    cd /kafka/bin
    # use screen to run in background
    ./zookeeper-server-start.sh ../config/zookeeper.properties
    cd /kafka/bin
    # use screen to run in background
    ./kafka-server-start.sh ../config/server.properties
  6. Create topics

    cd /kafka/bin
    ./kafka-topics.sh --create --bootstrap-server <ip>:9092 --replication-factor 1 --partitions 1 --topic <topic-name>
    
    # check list of kafka topics
    ./kafka-topics.sh --list --bootstrap-server <ip>:9092 
  7. Run producers and consumers to test Kafka setup

    # producer
    cd /kafka/bin
    ./kafka-console-producer.sh --bootstrap-server <ip>:9092 --topic <topic-name>
    # consumer
    cd /kafka/bin
    ./kafka-console-consumer.sh --bootstrap-server <ip>:9092 --topic <topic-name>
  8. Use the following commands to stop Zookeeper and Kafka

    cd /kafka/bin
    ./zookeeper-server-stop.sh
    ./kafka-server-stop.sh

Deploy Redis

  1. Make sure that docker is installed first.
  2. Create the directory for the deployment files.
    mkdir redis
    cd redis
  3. Save the docker compose file.
    • docker-compose.yml
    	services:
    	  redis:
    	    image: redis:latest
    	    command: ["redis-server"]
    	    ports:
    	      - 6379:6379
  4. Deploy Redis docker instance.
    docker compose up -d

Deploy Kafka UI

  1. Make sure that docker is installed first.
  2. Create the directory for the deployment files.
    mkdir kafka_ui
    cd kafka_ui
  3. Save the docker compose file.
    • docker-compose.yml
    	services:
    	  kafka-ui:
    	    container_name: kafka-ui
    	    image: provectuslabs/kafka-ui:master
    	    ports:
    	      - 8080:8080
    	    environment:
    	      KAFKA_CLUSTERS_0_NAME: testing_cluster
    	      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: 192.168.0.223:9092
  4. Deploy Kafka-UI docker instance.
    docker compose up -d
  5. Kafka-UI can be accessed on port 8080 with the IP it's deployed with. Kafka UI

Deploy Flower

  1. Make sure that docker is installed first.
  2. Create the directory for the deployment files.
    mkdir flower
    cd flower
  3. Save the docker compose and environment variable files.
    • docker-compose.yml
    	services:
    	  flower:
    	    image: mher/flower
    	    container_name: flower
    	    env_file:
    	      - ${ENV_FILE:-.env}
    	    environment:
    	      - FLOWER_PORT=5555
    	      - FLOWER_PERSISTENT=True
    	      - FLOWER_STATE_SAVE_INTERVAL=10000
    	      - FLOWER_DB=/etc/db/flower.db
    	      - TZ=Asia/Seoul
    	    ports:
    	      - "5555:5555"
    	    volumes:
    	      - ./flower/storage:/etc/db/
    	    user: root
    • .env
    	# CELERY_BROKER_URL="redis://<redis_server_ip>:<port>/0"
    	CELERY_BROKER_URL="redis://192.168.0.223:6379/0"
  4. Deploy Flower docker instance.
    docker compose up -d
  5. Flower can be accessed on port 5555 with the IP it's deployed with. Flower UI

Deploy the Anyang Research assistance project (Flask & Celery)

  1. Make sure that docker is installed first.
  2. Copy and unzip the project code zip file in the device.
    unzip anyang-project-main.zip
    cd anyang-project-main
  3. The only change that needs to be maid is to set the Ollama, Redis & Kafka server IPs & ports in the .env file.
    # DB_HOST='<maria_db_server_ip>'
    DB_HOST='192.168.0.222'
    DB_USER='scrapy_user'
    DB_PWD='scrapy_user_pwd'
    DB_NAME='scrapy_db'
    DB_PORT=3376
    # KAFKA_BROKER_IP='<kafka_broker_server_ip>'
    KAFKA_BROKER_IP='192.168.0.223'
    KAFKA_BROKER_PORT=9092
    # CELERY_REDIS_IP='<redis_server_ip>'
    CELERY_REDIS_IP='192.168.0.223'
    CELERY_REDIS_PORT=6379
    # OLLAMA_SERVER_IP='<ollama_server_ip>'
    OLLAMA_SERVER_IP='192.168.0.224'
    OLLAMA_SERVER_PORT=11500
  4. Deploy Anyang Flask & Celery docker instances.
    # need to build this one because its a custom image
    docker compose up -d --build
  5. The system can be accessed on port 5004 with the IP it's deployed with.

Troubleshooting

  • Database tables creation race condition between Flask & Celery in the first deployment. Table creation race condition error
    • This occurs because both Flask & Celery work with the same application context (which is needed for integration) and there is a code which creates all the Tables on the Database if they don't exist. So in the initial deployment of the project there could be a race condition between Flask & Celery to create the tables. And this may fail either the Flask or Celery docker containers. The simple fix is to start them up again using the following command.
       	docker compose up -d

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors