Anyang project ReadMe

This project scrapes the arxiv site for papers, and then uses LLM (Qwen 2.5:0.5b model) to filter papers and show the user a word cloud and bar chart of extracted keywords. Additionally the top 5 papers are shown based on score they get.

Sub-projects

anyang-scrapy: includes the scrapy codebase to scrape the target site
- Arxiv scrapping API: http://export.arxiv.org/api/query?search_query=all:`<add_keyword_here>&start=0&max_results=Example: keyword =>machine learning`
- http://export.arxiv.org/api/query?search_query=all:machine+learning&start=0&max_results=50
anyang-flask: includes the flask backend to manage services and integrate the system components
anyang-llm: include initial code to interact with Qwen LLM.

Team Members:

Duguma Yeshitla
천영학
김현주
이채림

Required software

Apache Kafka
Zookeeper
Redis
Celery
Ollama
Maria DB
Docker

Supplementary Software

Kafka UI
Flower for Celery
DBeaver

Deployment steps

Deploy Ollama

Install Ollama on the device following the steps provided in the following link. https://ollama.com/download.
The default Ollama instance running is not accessible from external IPs, so it has to be stopped. Here are the steps to achieve that.
- Make sure the default ollama instance is not running. It will default to run in localhost with the port of 11434. This may block the new ollama instance you will be running.
```
 	sudo systemctl status ollama
 	sudo systemctl stop ollama
 	sudo systemctl disable ollama
```
- Also make sure no other Ollama instance is running and if there is kill it.
```
 # check for existing ollama instance
 ps aux | grep ollama
 
 # stop existing ollama instances if there are any
 sudo kill -9 <process_id>
```
- Run Ollama serve in the ubiquitous IP and PORT (i.e. accessible from anywhere)
```
 OLLAMA_HOST=0.0.0.0:11500 ollama serve	
```
- Check if the model needed are available in the ubiquitous IP access (0.0.0.0) address for Ollama. If not pull it in this mode. Ollama serve command needs to be run for the following to work.
```
 # check the list of models
 OLLAMA_HOST=0.0.0.0:11500 ollama list

 # pull the model if it does not exist (eg: qwen3:0.6b)
 OLLAMA_HOST=0.0.0.0:11500 ollama pull <model:size>
 # for this project we need Qwen 2.5:0.5b model
 OLLAMA_HOST=0.0.0.0:11500 ollama pull qwen2.5:0.5b
```

Deploy Maria DB

Make sure that docker is installed first. Or else follow the following link (for ubuntu): https://docs.docker.com/engine/install/ubuntu/
Create the directory for the deployment files.
```
mkdir mariadb
cd mariadb
```

Save the docker compose and environment variable files.

docker-compose.yml

	services:
	  mariadb:
	    image: mariadb
	    environment:
	      MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
	      MYSQL_USER: ${MYSQL_USER}
	      MYSQL_PASSWORD: ${MYSQL_PASSWORD}
	      MYSQL_DATABASE: ${MYSQL_DATABASE}
	    ports:
	      - "3376:3306"
	    volumes:
	      - "./data.sql:/docker-entrypoint-initdb.d/1.sql"

.env

MYSQL_ROOT_PASSWORD = scrapy_root_pwd
MYSQL_USER = scrapy_user
MYSQL_PASSWORD = scrapy_user_pwd
MYSQL_DATABASE = scrapy_db

Deploy MariaDB docker instance. Based on the settings above, it will create all the database and user needed for the project.
```
docker compose up -d
```

Deploy Apache Kafka

Make sure Java JDK is installed.

sudo apt update
sudo apt install default-jdk

Download & extract Kafka server instance from official site (https://kafka.apache.org/downloads).

wget https://dlcdn.apache.org/kafka/3.9.0/kafka_{version}.tgz
tar -xvf kafka_{version}.tgz
mv kafka_{version} kafka
# update folder permissions
chmod 777 -R kafka

Create data directory inside kafka one.

cd kafka
mkdir -p data/{zookeeper,kafka}
chmod 777 -R data

Edit zookeeper and Kafka configuration files

vim config/zookeeper.properties
	dataDir=/home/user/kafka/data/zookeeper

vim config/server.properties
	log.dirs=/home/user/kafka/data/kafka
	listeners=PLAINTEXT://<ip>:9092

Run zookeeper and Kafka instances

cd /kafka/bin
# use screen to run in background
./zookeeper-server-start.sh ../config/zookeeper.properties

cd /kafka/bin
# use screen to run in background
./kafka-server-start.sh ../config/server.properties

Create topics

cd /kafka/bin
./kafka-topics.sh --create --bootstrap-server <ip>:9092 --replication-factor 1 --partitions 1 --topic <topic-name>

# check list of kafka topics
./kafka-topics.sh --list --bootstrap-server <ip>:9092

Run producers and consumers to test Kafka setup

# producer
cd /kafka/bin
./kafka-console-producer.sh --bootstrap-server <ip>:9092 --topic <topic-name>

# consumer
cd /kafka/bin
./kafka-console-consumer.sh --bootstrap-server <ip>:9092 --topic <topic-name>

Use the following commands to stop Zookeeper and Kafka

cd /kafka/bin
./zookeeper-server-stop.sh
./kafka-server-stop.sh

Deploy Redis

Make sure that docker is installed first.
Create the directory for the deployment files.
```
mkdir redis
cd redis
```

Save the docker compose file.

docker-compose.yml

	services:
	  redis:
	    image: redis:latest
	    command: ["redis-server"]
	    ports:
	      - 6379:6379

Deploy Redis docker instance.
```
docker compose up -d
```

Deploy Kafka UI

Make sure that docker is installed first.
Create the directory for the deployment files.
```
mkdir kafka_ui
cd kafka_ui
```

Save the docker compose file.

docker-compose.yml

	services:
	  kafka-ui:
	    container_name: kafka-ui
	    image: provectuslabs/kafka-ui:master
	    ports:
	      - 8080:8080
	    environment:
	      KAFKA_CLUSTERS_0_NAME: testing_cluster
	      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: 192.168.0.223:9092

Deploy Kafka-UI docker instance.
```
docker compose up -d
```
Kafka-UI can be accessed on port 8080 with the IP it's deployed with.

Deploy Flower

Make sure that docker is installed first.
Create the directory for the deployment files.
```
mkdir flower
cd flower
```

Save the docker compose and environment variable files.

docker-compose.yml

	services:
	  flower:
	    image: mher/flower
	    container_name: flower
	    env_file:
	      - ${ENV_FILE:-.env}
	    environment:
	      - FLOWER_PORT=5555
	      - FLOWER_PERSISTENT=True
	      - FLOWER_STATE_SAVE_INTERVAL=10000
	      - FLOWER_DB=/etc/db/flower.db
	      - TZ=Asia/Seoul
	    ports:
	      - "5555:5555"
	    volumes:
	      - ./flower/storage:/etc/db/
	    user: root

.env

	# CELERY_BROKER_URL="redis://<redis_server_ip>:<port>/0"
	CELERY_BROKER_URL="redis://192.168.0.223:6379/0"

Deploy Flower docker instance.
```
docker compose up -d
```
Flower can be accessed on port 5555 with the IP it's deployed with.

Deploy the Anyang Research assistance project (Flask & Celery)

Make sure that docker is installed first.
Copy and unzip the project code zip file in the device.
```
unzip anyang-project-main.zip
cd anyang-project-main
```

The only change that needs to be maid is to set the Ollama, Redis & Kafka server IPs & ports in the .env file.

# DB_HOST='<maria_db_server_ip>'
DB_HOST='192.168.0.222'
DB_USER='scrapy_user'
DB_PWD='scrapy_user_pwd'
DB_NAME='scrapy_db'
DB_PORT=3376
# KAFKA_BROKER_IP='<kafka_broker_server_ip>'
KAFKA_BROKER_IP='192.168.0.223'
KAFKA_BROKER_PORT=9092
# CELERY_REDIS_IP='<redis_server_ip>'
CELERY_REDIS_IP='192.168.0.223'
CELERY_REDIS_PORT=6379
# OLLAMA_SERVER_IP='<ollama_server_ip>'
OLLAMA_SERVER_IP='192.168.0.224'
OLLAMA_SERVER_PORT=11500

Deploy Anyang Flask & Celery docker instances.

# need to build this one because its a custom image
docker compose up -d --build

The system can be accessed on port 5004 with the IP it's deployed with.

Troubleshooting

Database tables creation race condition between Flask & Celery in the first deployment.
- This occurs because both Flask & Celery work with the same application context (which is needed for integration) and there is a code which creates all the Tables on the Database if they don't exist. So in the initial deployment of the project there could be a race condition between Flask & Celery to create the tables. And this may fail either the Flask or Celery docker containers. The simple fix is to start them up again using the following command.
```
 	docker compose up -d
```

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
anyang-flask		anyang-flask
anyang-llm		anyang-llm
anyang-scrapy		anyang-scrapy
assets		assets
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
flask_kafka.lock		flask_kafka.lock
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anyang project ReadMe

Sub-projects

Required software

Supplementary Software

Deployment steps

Deploy Ollama

Deploy Maria DB

Deploy Apache Kafka

Deploy Redis

Deploy Kafka UI

Deploy Flower

Deploy the Anyang Research assistance project (Flask & Celery)

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Anyang project ReadMe

Sub-projects

Required software

Supplementary Software

Deployment steps

Deploy Ollama

Deploy Maria DB

Deploy Apache Kafka

Deploy Redis

Deploy Kafka UI

Deploy Flower

Deploy the Anyang Research assistance project (Flask & Celery)

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages