Complete implementation of multi-process Video AI Analysis Tool

- Multi-process architecture: web, backend, analysis/training workers
- SQLite database for persistent configuration and system prompts
- Configurable CUDA/ROCm backends with command line override
- TCP socket-based inter-process communication
- Web interface with comprehensive configuration management
- GPLv3 licensing with copyright notices on all files
- Complete documentation: README, architecture docs, changelog
- Build and deployment scripts for different GPU backends
- Git repository setup with .gitignore for build artifacts
parents
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
*.manifest
*.spec
# Virtual environments
venv/
venv-*/
env/
ENV/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Logs
*.log
logs/
# Database
*.db
*.sqlite
*.sqlite3
# Temporary files
*.tmp
*.temp
temp/
tmp/
# Build artifacts
vidai-backend
vidai-web
vidai-analysis-*
vidai-training-*
# Result files
/tmp/vidai_results/
# Config (but keep structure)
/home/*/.config/vidai/
~/.config/vidai/
\ No newline at end of file
# Changelog
All notable changes to the Video AI Analysis Tool will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Multi-process architecture with separate web, backend, and worker processes
- Configurable CUDA/ROCm backend selection for analysis and training
- TCP socket-based inter-process communication
- Web-based configuration interface
- Self-contained build system with PyInstaller
- Comprehensive documentation and README
- GPLv3 licensing and copyright notices
### Changed
- Refactored monolithic Flask app into distributed processes
- Replaced direct analysis calls with message-passing architecture
- Updated build scripts to generate multiple executables
- Improved error handling and process management
### Technical Details
- Implemented socket-based communication protocol
- Added configuration management system
- Created worker registration and routing system
- Added file-based result storage for reliability
- Implemented graceful shutdown and process monitoring
## [0.1.0] - 2024-10-05
### Added
- Initial release of Video AI Analysis Tool
- Web interface for image/video analysis
- Qwen2.5-VL model integration
- Frame extraction and video processing
- Model training capabilities
- CUDA/ROCm support via separate requirements
- Basic build and setup scripts
### Features
- Upload and analyze images and videos
- Automatic frame extraction from videos
- AI-powered scene description and summarization
- Fine-tune models on custom datasets
- GPU memory monitoring
- Progress tracking and cancellation
### Infrastructure
- Flask web framework
- PyTorch with CUDA/ROCm support
- Transformers library integration
- OpenCV for video processing
- PyInstaller for executable builds
\ No newline at end of file
This diff is collapsed.
# Video AI Analysis Tool
A multi-process web-based tool for analyzing images and videos using AI models. Supports frame extraction, activity detection, video segmentation, and model training with configurable CUDA/ROCm backends.
## Features
- **Web Interface**: User-friendly web UI for uploading and analyzing media
- **AI Analysis**: Powered by Qwen2.5-VL models for image/video understanding
- **Multi-Process Architecture**: Separate processes for web, backend, and workers
- **Backend Selection**: Choose between CUDA and ROCm for analysis/training
- **Video Processing**: Automatic frame extraction and summarization
- **Model Training**: Fine-tune models on custom datasets
- **Configuration Management**: SQLite database for persistent settings and system prompts
- **Self-Contained**: No external dependencies beyond Python and system libraries
## Architecture
The application consists of four main components:
1. **Web Interface Process**: Flask-based UI server
2. **Backend Process**: Request routing and worker management
3. **Analysis Workers**: CUDA and ROCm variants for media analysis
4. **Training Workers**: CUDA and ROCm variants for model training
Communication between processes uses TCP sockets for reliability and self-containment.
## Requirements
- Python 3.8+
- PyTorch (CUDA or ROCm)
- Flask
- Transformers
- OpenCV
- Other dependencies listed in requirements files
## Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd videotest
```
2. Set up virtual environment:
```bash
./setup.sh cuda # or ./setup.sh rocm
source venv-cuda/bin/activate # or venv-rocm
```
3. Build executables (optional):
```bash
./build.sh cuda # or ./build.sh rocm
```
## Usage
### Command Line Options
All command line options can be configured in the database and overridden at runtime:
```bash
python vidai.py [options]
```
Options:
- `--model MODEL`: Default model path (default: Qwen/Qwen2.5-VL-7B-Instruct)
- `--dir DIR`: Allowed directory for local file access
- `--optimize`: Optimize frame extraction (resize to 640px width)
- `--ffmpeg`: Force use of ffmpeg for frame extraction
- `--flash`: Enable Flash Attention 2
- `--analysis-backend {cuda,rocm}`: Backend for analysis
- `--training-backend {cuda,rocm}`: Backend for training
- `--host HOST`: Host to bind server to (default: 0.0.0.0)
- `--port PORT`: Port to bind server to (default: 5000)
- `--debug`: Enable debug mode
Command line options override database settings and are saved for future runs.
### Development Mode
1. Start all processes:
```bash
./start.sh cuda # or ./start.sh rocm
# Or run directly:
python vidai.py --analysis-backend cuda
```
2. Open browser to `http://localhost:5000`
### Production Mode
Use the built executables from `dist/` directory.
## Configuration
- Access the configuration page at `/config` in the web interface
- Select preferred backend (CUDA/ROCm) for analysis and training
- Configure system prompts, models, and processing options
- All settings are saved to SQLite database at `~/.config/vidai/vidai.db`
- Command line options override and update database settings
## API
The backend communicates via TCP sockets:
- Web interface: localhost:5001
- Workers: localhost:5002
Message format: JSON with `msg_type`, `msg_id`, and `data` fields.
## Development
### Project Structure
```
videotest/
├── vidai/ # Main package
│ ├── __init__.py
│ ├── backend.py # Backend process
│ ├── web.py # Web interface process
│ ├── worker_analysis.py # Analysis worker
│ ├── worker_training.py # Training worker
│ ├── comm.py # Communication utilities
│ └── config.py # Configuration management
├── templates/ # Flask templates
├── static/ # Static files
├── requirements*.txt # Dependencies
├── build.sh # Build script
├── start.sh # Startup script
├── setup.sh # Setup script
├── clean.sh # Clean script
├── LICENSE # GPLv3 license
└── README.md # This file
```
### Adding New Features
1. Define message types in `comm.py`
2. Implement handlers in backend and workers
3. Update web interface routes
4. Add configuration options if needed
## License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
See [LICENSE](LICENSE) for details.
Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make changes
4. Test thoroughly
5. Submit a pull request
## Support
For issues and questions, please open a GitHub issue or contact the maintainer.
\ No newline at end of file
#!/bin/bash
# Video AI Build Script
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
# Build script for Video AI Analysis Tool
# Creates self-contained executables for each component using PyInstaller
TARGET=${1:-cuda} # Default to cuda
echo "Building Video AI Analysis Tool for $TARGET..."
# Check if pyinstaller is installed
if ! command -v pyinstaller &> /dev/null; then
echo "PyInstaller not found. Installing..."
pip install pyinstaller
fi
# Build backend
pyinstaller --onefile \
--name vidai-backend \
--hidden-import torch \
--hidden-import transformers \
vidai/backend.py
# Build web interface
pyinstaller --onefile \
--name vidai-web \
--add-data "templates:templates" \
--add-data "static:static" \
--hidden-import flask \
vidai/web.py
# Build analysis worker
pyinstaller --onefile \
--name vidai-analysis-$TARGET \
--hidden-import torch \
--hidden-import transformers \
--hidden-import cv2 \
vidai/worker_analysis.py
# Build training worker
pyinstaller --onefile \
--name vidai-training-$TARGET \
--hidden-import torch \
--hidden-import transformers \
vidai/worker_training.py
echo "Build complete for $TARGET!"
echo "Executables created in dist/:"
echo " - vidai-backend"
echo " - vidai-web"
echo " - vidai-analysis-$TARGET"
echo " - vidai-training-$TARGET"
\ No newline at end of file
#!/bin/bash
# Video AI Clean Script
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
# Clean script for Video AI Analysis Tool
# Removes PyInstaller build artifacts
echo "Cleaning build artifacts..."
# Remove PyInstaller directories
if [ -d "dist" ]; then
rm -rf dist
echo "Removed dist/ directory"
fi
if [ -d "build" ]; then
rm -rf build
echo "Removed build/ directory"
fi
# Remove .spec file
if [ -f "vidai.spec" ]; then
rm vidai.spec
echo "Removed vidai.spec"
fi
# Remove virtual environments
for venv in venv venv-cpu venv-cuda venv-rocm; do
if [ -d "$venv" ]; then
rm -rf "$venv"
echo "Removed $venv/ directory"
fi
done
# Remove __pycache__ directories
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
echo "Removed __pycache__ directories"
echo "Clean complete!"
\ No newline at end of file
# Architecture Documentation
## Overview
The Video AI Analysis Tool is designed as a multi-process application to provide scalability, fault isolation, and flexible backend selection for GPU acceleration.
## Process Architecture
### Components
1. **Web Interface Process** (`vidai/web.py`)
- Flask-based HTTP server
- Serves the user interface
- Communicates with backend via TCP socket (port 5001)
- Handles file uploads and result polling
2. **Backend Process** (`vidai/backend.py`)
- Central message router
- Manages worker registration and task distribution
- Listens on two ports:
- 5001: Web interface communication
- 5002: Worker communication
- Routes requests based on configured backends
3. **Analysis Workers** (`vidai/worker_analysis.py`)
- CUDA variant: Processes analysis requests using CUDA acceleration
- ROCm variant: Processes analysis requests using ROCm acceleration
- Connect to backend on port 5002
- Handle image/video analysis using Qwen2.5-VL models
4. **Training Workers** (`vidai/worker_training.py`)
- CUDA variant: Handles model training with CUDA
- ROCm variant: Handles model training with ROCm
- Connect to backend on port 5002
- Execute training scripts with appropriate GPU backend
### Communication Protocol
All inter-process communication uses TCP sockets with JSON messages:
```json
{
"msg_type": "analyze_request",
"msg_id": "uuid-string",
"data": {
"model_path": "Qwen/Qwen2.5-VL-7B-Instruct",
"prompt": "Describe this image",
"local_path": "/path/to/media",
"interval": 10
}
}
```
#### Message Types
- `analyze_request`: Web to backend, analysis job
- `train_request`: Web to backend, training job
- `config_update`: Web to backend, update configuration
- `get_config`: Web to backend, retrieve current config
- `register`: Worker to backend, register worker type
- `analyze_response`: Worker to backend, analysis result
- `train_response`: Worker to backend, training result
- `config_response`: Backend to web, configuration data
### Configuration Management
- Stored in `~/.config/vidai/config.json`
- Managed by `vidai/config.py`
- Allows selection of CUDA/ROCm for analysis and training independently
### Data Flow
1. User uploads media via web interface
2. Web process sends request to backend
3. Backend routes to appropriate worker based on config
4. Worker processes request and sends result back
5. Backend forwards result to web process
6. Web process displays result to user
### Error Handling
- Socket timeouts and reconnections
- Worker registration and health checks
- Graceful degradation when workers unavailable
- File-based result storage for reliability
### Security Considerations
- Local-only TCP sockets (localhost)
- No authentication (single-user assumption)
- File system access restricted to configured directories
- Input validation on all message data
## Deployment
### Development
Use `start.sh` to launch all processes manually for development and debugging.
### Production
Build executables with `build.sh` and deploy the resulting binaries from `dist/`.
### Scaling
- Multiple worker instances can be started for load balancing
- Backend can distribute requests across available workers
- Web interface can be load balanced independently
## Future Enhancements
- Message queue system (Redis/RabbitMQ) for better scalability
- Authentication and multi-user support
- REST API for programmatic access
- Containerization with Docker
- Monitoring and metrics collection
\ No newline at end of file
Flask>=2.0.0
torch>=2.0.0+cu118 --index-url https://download.pytorch.org/whl/cu118
transformers>=4.30.0
opencv-python>=4.5.0
psutil>=5.8.0
pynvml>=11.0.0
flash-attn>=2.0.0
pyinstaller>=5.0.0
\ No newline at end of file
Flask>=2.0.0
torch>=2.0.0+rocm5.6 --index-url https://download.pytorch.org/whl/rocm5.6
transformers>=4.30.0
opencv-python>=4.5.0
psutil>=5.8.0
pynvml>=11.0.0
pyinstaller>=5.0.0
\ No newline at end of file
Flask>=2.0.0
torch>=2.0.0
transformers>=4.30.0
opencv-python>=4.5.0
psutil>=5.8.0
pynvml>=11.0.0
flash-attn>=2.0.0
pyinstaller>=5.0.0
\ No newline at end of file
#!/bin/bash
# Video AI Setup Script
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
# Setup script for Video AI Analysis Tool
# Creates a virtual environment and installs dependencies
# Parse arguments
TARGET="cpu"
if [ "$1" = "cuda" ]; then
TARGET="cuda"
elif [ "$1" = "rocm" ]; then
TARGET="rocm"
fi
echo "Setting up Video AI Analysis Tool for $TARGET..."
# Create virtual environment
if [ ! -d "venv-$TARGET" ]; then
python3 -m venv venv-$TARGET
echo "Created virtual environment in venv-$TARGET/"
else
echo "Virtual environment venv-$TARGET already exists"
fi
# Activate virtual environment
source venv-$TARGET/bin/activate
# Upgrade pip
pip install --upgrade pip
# Install requirements based on target
REQ_FILE="requirements.txt"
if [ "$TARGET" = "cuda" ] && [ -f "requirements-cuda.txt" ]; then
REQ_FILE="requirements-cuda.txt"
elif [ "$TARGET" = "rocm" ] && [ -f "requirements-rocm.txt" ]; then
REQ_FILE="requirements-rocm.txt"
fi
if [ -f "$REQ_FILE" ]; then
pip install -r $REQ_FILE
echo "Installed dependencies from $REQ_FILE"
else
echo "$REQ_FILE not found"
exit 1
fi
echo "Setup complete for $TARGET!"
echo "To activate the environment: source venv-$TARGET/bin/activate"
echo "To run the application: python vidai.py --help"
\ No newline at end of file
#!/bin/bash
# Video AI Startup Script
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
# Startup script for Video AI Analysis Tool
# Launches all processes in the correct order
TARGET=${1:-cuda} # Default to cuda
echo "Starting Video AI Analysis Tool for $TARGET..."
# Create result directory
mkdir -p /tmp/vidai_results
# Start backend
echo "Starting backend..."
python vidai/backend.py &
BACKEND_PID=$!
sleep 2
# Start workers
echo "Starting analysis worker..."
python vidai/worker_analysis.py $TARGET &
ANALYSIS_PID=$!
echo "Starting training worker..."
python vidai/worker_training.py $TARGET &
TRAINING_PID=$!
sleep 2
# Start web interface
echo "Starting web interface..."
python vidai/web.py &
WEB_PID=$!
echo "All processes started!"
echo "Web interface available at http://localhost:5000"
echo "Press Ctrl+C to stop all processes"
# Wait for interrupt
trap "echo 'Stopping all processes...'; kill $WEB_PID $TRAINING_PID $ANALYSIS_PID $BACKEND_PID; exit" INT
wait
\ No newline at end of file
<!DOCTYPE html>
<html>
<head>
<title>VideoModel AI</title>
<style>
body { font-family: Arial, sans-serif; background-color: #f4f4f4; margin: 0; padding: 20px; display: flex; justify-content: center; align-items: flex-start; }
.main { flex: 1; max-width: 800px; background: white; padding: 20px; border-radius: 8px; box-shadow: 0 0 10px rgba(0,0,0,0.1); margin-right: 20px; }
.sidebar { width: 300px; background: white; padding: 20px; border-radius: 8px; box-shadow: 0 0 10px rgba(0,0,0,0.1); }
h1 { color: #333; text-align: center; }
nav { text-align: center; margin-bottom: 20px; }
nav a { margin: 0 10px; text-decoration: none; color: #007bff; }
form { margin-bottom: 20px; }
label { display: block; margin-bottom: 5px; }
input[type="text"], input[type="file"], textarea { width: 100%; padding: 8px; margin-bottom: 10px; border: 1px solid #ccc; border-radius: 4px; }
input[type="submit"] { background: #007bff; color: white; padding: 10px; border: none; border-radius: 4px; cursor: pointer; }
input[type="submit"]:hover { background: #0056b3; }
.result { background: #e9ecef; padding: 10px; border-radius: 4px; }
.stats { font-size: 14px; }
</style>
</head>
<body>
<div class="main">
<h1>VideoModel AI Web Interface</h1>
<nav>
<a href="/">Analysis</a> | <a href="/train">Training</a>
</nav>
<h2>Analyze Image/Video</h2>
<form method="post" enctype="multipart/form-data">
<label>Model Path: <input type="text" name="model_path" value="{{ model_path_default }}"></label>
<p><a href="/system">Edit System Prompt</a></p>
<label>Upload File: <input type="file" name="file" accept="image/*,video/*" id="fileInput"></label>
<progress id="uploadProgress" value="0" max="100" style="display:none; width:100%;"></progress>
<div id="progressText"></div>
{% if allowed_dir %}
<label>Or Local Path: <input type="text" name="local_path" id="local_path"> <button type="button" onclick="openFileBrowser()">Browse</button></label>
{% endif %}
<label>Prompt: <textarea name="prompt" rows="5" cols="80">Describe this image.</textarea></label>
<input type="submit" value="Analyze">
<button type="button" onclick="cancelAnalysis()">Cancel Analysis</button>
</form>
<div class="result" id="result_div" style="display:none;"></div>
{% if result %}
<div class="result">
<h3>Result:</h3>
<p>{{ result }}</p>
</div>
{% endif %}
</div>
<div class="sidebar">
<div id="stats" class="stats">Loading stats...</div>
</div>
<script>
function openFileBrowser() {
window.open('/files', 'filebrowser', 'width=600,height=400');
}
async function updateStats() {
try {
const response = await fetch('/stats');
const data = await response.json();
let html = '<h3>GPU Stats</h3>';
html += `<p style="color: ${data.status === 'Idle' ? 'green' : 'orange'};">Status: ${data.status}</p>`;
if (data.elapsed > 0) {
html += `<p>Elapsed: ${data.elapsed.toFixed(1)}s</p>`;
}
if (data.gpu_count > 0) {
data.gpus.forEach((gpu, i) => {
let memPercent = (gpu.memory_used / gpu.memory_total * 100).toFixed(1);
html += `<p>GPU ${i}: ${gpu.name}<br>Memory: <progress value="${gpu.memory_used}" max="${gpu.memory_total}"></progress> ${gpu.memory_used.toFixed(2)} / ${gpu.memory_total.toFixed(2)} GB (${memPercent}%)<br>Utilization: ${gpu.utilization}%</p>`;
});
} else {
html += '<p>No GPUs detected</p>';
}
html += `<p>CPU: ${data.cpu_percent.toFixed(1)}%</p>`;
html += `<p>RAM: ${data.ram_used.toFixed(2)} / ${data.ram_total.toFixed(2)} GB</p>`;
document.getElementById('stats').innerHTML = html;
if (data.result) {
document.getElementById('result_div').innerHTML = '<h3>Result:</h3><p>' + data.result + '</p>';
document.getElementById('result_div').style.display = 'block';
}
} catch (e) {
document.getElementById('stats').innerHTML = '<p>Error loading stats</p>';
}
}
setInterval(updateStats, 5000);
window.onload = updateStats;
function cancelAnalysis() {
fetch('/cancel', {method: 'POST'}).then(() => updateStats());
}
// Upload progress with chunked upload
const form = document.querySelector('form');
if (form) {
form.addEventListener('submit', async function(e) {
e.preventDefault();
const fileInput = document.getElementById('fileInput');
const file = fileInput.files[0];
if (!file) {
// Submit form normally if no file
const formData = new FormData(this);
const xhr = new XMLHttpRequest();
xhr.addEventListener('load', function() {
window.location.reload();
});
xhr.open('POST', '/');
xhr.send(formData);
return;
}
const chunkSize = 1024 * 1024; // 1MB
const totalChunks = Math.ceil(file.size / chunkSize);
const uploadId = Date.now().toString();
const concurrency = 3;
let chunksSent = 0;
async function sendChunk(index) {
const start = index * chunkSize;
const end = Math.min(start + chunkSize, file.size);
const chunk = file.slice(start, end);
const formData = new FormData();
formData.append('chunk', chunk);
formData.append('chunk_index', index);
formData.append('total_chunks', totalChunks);
formData.append('file_name', file.name);
formData.append('upload_id', uploadId);
return new Promise((resolve) => {
const xhr = new XMLHttpRequest();
xhr.upload.addEventListener('progress', function(e) {
if (e.lengthComputable) {
const percent = ((chunksSent * chunkSize + e.loaded) / file.size) * 100;
document.getElementById('uploadProgress').value = percent;
document.getElementById('uploadProgress').style.display = 'block';
const speed = (chunksSent * chunkSize + e.loaded) / ((Date.now() - startTime) / 1000);
const remaining = (file.size - (chunksSent * chunkSize + e.loaded)) / speed;
document.getElementById('progressText').innerText = `Uploaded ${((chunksSent * chunkSize + e.loaded) / 1024 / 1024).toFixed(2)} MB of ${(file.size / 1024 / 1024).toFixed(2)} MB (${percent.toFixed(1)}%) - Speed: ${(speed / 1024 / 1024).toFixed(2)} MB/s - ETA: ${Math.round(remaining)}s`;
}
});
xhr.addEventListener('load', function() {
chunksSent++;
resolve();
});
xhr.open('POST', '/upload_chunk');
xhr.send(formData);
});
}
const startTime = Date.now();
for (let i = 0; i < totalChunks; i += concurrency) {
const promises = [];
for (let j = 0; j < concurrency && i + j < totalChunks; j++) {
promises.push(sendChunk(i + j));
}
await Promise.all(promises);
}
// All chunks sent, submit form
const formData2 = new FormData(form);
formData2.append('upload_id', uploadId);
formData2.append('file_name', file.name);
const xhr2 = new XMLHttpRequest();
xhr2.addEventListener('load', function() {
window.location.reload();
});
xhr2.open('POST', '/');
xhr2.send(formData2);
});
}
</script>
</body>
</html>
\ No newline at end of file
#!/usr/bin/env python3
# Video AI Analysis Tool
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Video AI Analysis Tool
A web-based tool for analyzing images and videos using AI models.
Supports frame extraction, activity detection, and video segmentation.
"""
import argparse
import sys
import os
import subprocess
# Add current directory to path for vidai module
sys.path.insert(0, os.path.dirname(__file__))
from vidai.config import (
get_config, set_config, get_default_model, set_default_model,
get_analysis_backend, set_analysis_backend, get_training_backend, set_training_backend,
get_optimize, set_optimize, get_ffmpeg, set_ffmpeg, get_flash, set_flash,
get_host, set_host, get_port, set_port, get_debug, set_debug, get_allowed_dir, set_allowed_dir
)
def main():
parser = argparse.ArgumentParser(
description="Video AI Analysis Tool - Web interface for AI-powered media analysis",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python vidai.py --model Qwen/Qwen2.5-VL-7B-Instruct
python vidai.py --dir /path/to/media --optimize
python vidai.py --flash --ffmpeg
python vidai.py --analysis-backend rocm
"""
)
# Read defaults from config
default_model = get_default_model()
default_analysis_backend = get_analysis_backend()
default_training_backend = get_training_backend()
default_optimize = get_optimize()
default_ffmpeg = get_ffmpeg()
default_flash = get_flash()
default_host = get_host()
default_port = get_port()
default_debug = get_debug()
parser.add_argument(
'--model',
default=default_model,
help=f'Default model path or HuggingFace model name (default: {default_model})'
)
parser.add_argument(
'--dir',
default=get_config('allowed_dir', ''),
help='Allowed directory for local file access'
)
parser.add_argument(
'--optimize',
action='store_true',
default=default_optimize,
help='Optimize frame extraction (resize to 640px width)'
)
parser.add_argument(
'--ffmpeg',
action='store_true',
default=default_ffmpeg,
help='Force use of ffmpeg for frame extraction instead of OpenCV'
)
parser.add_argument(
'--flash',
action='store_true',
default=default_flash,
help='Enable Flash Attention 2 for faster inference (requires flash-attn package)'
)
parser.add_argument(
'--analysis-backend',
choices=['cuda', 'rocm'],
default=default_analysis_backend,
help=f'Backend for analysis (default: {default_analysis_backend})'
)
parser.add_argument(
'--training-backend',
choices=['cuda', 'rocm'],
default=default_training_backend,
help=f'Backend for training (default: {default_training_backend})'
)
parser.add_argument(
'--host',
default=default_host,
help=f'Host to bind the server to (default: {default_host})'
)
parser.add_argument(
'--port',
type=int,
default=default_port,
help=f'Port to bind the server to (default: {default_port})'
)
parser.add_argument(
'--debug',
action='store_true',
default=default_debug,
help='Enable debug mode'
)
args = parser.parse_args()
# Update config with command line values
set_default_model(args.model)
set_allowed_dir(args.dir)
set_optimize(args.optimize)
set_ffmpeg(args.ffmpeg)
set_flash(args.flash)
set_analysis_backend(args.analysis_backend)
set_training_backend(args.training_backend)
set_host(args.host)
set_port(args.port)
set_debug(args.debug)
print("Starting Video AI Analysis Tool...")
print(f"Server will be available at http://{args.host}:{args.port}")
print("Press Ctrl+C to stop")
# Start backend process
backend_cmd = [sys.executable, '-m', 'vidai.backend']
backend_proc = subprocess.Popen(backend_cmd)
# Start web process
web_cmd = [sys.executable, '-m', 'vidai.web']
web_proc = subprocess.Popen(web_cmd)
try:
# Wait for processes
backend_proc.wait()
web_proc.wait()
except KeyboardInterrupt:
print("Shutting down...")
backend_proc.terminate()
web_proc.terminate()
backend_proc.wait()
web_proc.wait()
if __name__ == "__main__":
main()
\ No newline at end of file
# Video AI Package
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Video AI Analysis Tool Package
"""
\ No newline at end of file
# Video AI Backend Process
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Backend process for Video AI.
Manages request routing between web interface and worker processes.
"""
import time
import threading
from .comm import SocketServer, Message
from .config import get_analysis_backend, get_training_backend, set_analysis_backend, set_training_backend
worker_sockets = {} # type: dict
def handle_web_message(message: Message) -> Message:
"""Handle messages from web interface."""
if message.msg_type == 'analyze_request':
backend = get_analysis_backend()
worker_key = f'analysis_{backend}'
if worker_key in worker_sockets:
# Forward to worker
worker_sockets[worker_key].sendall(
f'{{"msg_type": "{message.msg_type}", "msg_id": "{message.msg_id}", "data": {message.data}}}\n'.encode('utf-8')
)
return None # No immediate response
else:
return Message('error', message.msg_id, {'error': f'Worker {worker_key} not available'})
elif message.msg_type == 'train_request':
backend = get_training_backend()
worker_key = f'training_{backend}'
if worker_key in worker_sockets:
worker_sockets[worker_key].sendall(
f'{{"msg_type": "{message.msg_type}", "msg_id": "{message.msg_id}", "data": {message.data}}}\n'.encode('utf-8')
)
return None
else:
return Message('error', message.msg_id, {'error': f'Worker {worker_key} not available'})
elif message.msg_type == 'config_update':
data = message.data
if 'analysis_backend' in data:
set_analysis_backend(data['analysis_backend'])
if 'training_backend' in data:
set_training_backend(data['training_backend'])
return Message('config_response', message.msg_id, {'status': 'updated'})
elif message.msg_type == 'get_config':
return Message('config_response', message.msg_id, {
'analysis_backend': get_analysis_backend(),
'training_backend': get_training_backend()
})
return Message('error', message.msg_id, {'error': 'Unknown message type'})
def handle_worker_message(message: Message, client_sock) -> None:
"""Handle messages from workers."""
if message.msg_type == 'register':
worker_type = message.data.get('type')
if worker_type:
worker_sockets[worker_type] = client_sock
print(f"Worker {worker_type} registered")
elif message.msg_type in ['analyze_response', 'train_response']:
# Forward to web - but since web is connected via different server, need to store or something
# For simplicity, assume web polls for results, but since socket, perhaps have a pending responses dict
# This is getting complex. Perhaps use a shared dict or file for results.
# To keep simple, since web is Flask, it can have a global dict for results, but since separate process, hard.
# Perhaps the backend sends to web via its own connection, but web connects per request.
# For responses, backend can store in a file or database, and web reads from there.
# But to keep self-contained, use a simple JSON file for pending results.
# Web sends request with id, backend processes, stores result in file with id, web polls for result file.
# Yes, that's ad-hoc.
# So, for responses, write to a file.
import os
result_dir = '/tmp/vidai_results'
os.makedirs(result_dir, exist_ok=True)
with open(os.path.join(result_dir, f"{message.msg_id}.json"), 'w') as f:
import json
json.dump({
'msg_type': message.msg_type,
'msg_id': message.msg_id,
'data': message.data
}, f)
def worker_message_handler(message: Message, client_sock) -> None:
"""Handler for worker messages."""
handle_worker_message(message, client_sock)
def backend_process() -> None:
"""Main backend process loop."""
print("Starting Video AI Backend...")
# Start web server on port 5001
web_server = SocketServer(port=5001)
web_server.start(handle_web_message)
# Start worker server on port 5002
worker_server = SocketServer(port=5002)
worker_server.start(worker_message_handler)
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
print("Backend shutting down...")
web_server.stop()
worker_server.stop()
if __name__ == "__main__":
backend_process()
\ No newline at end of file
# Video AI Communication Module
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Communication protocol for Video AI multi-process architecture.
Uses sockets for inter-process communication.
"""
import socket
import json
import threading
import time
from typing import Dict, Any, Optional
from dataclasses import dataclass
@dataclass
class Message:
"""Message structure for inter-process communication."""
msg_type: str
msg_id: str
data: Dict[str, Any]
class SocketCommunicator:
"""Handles socket-based communication."""
def __init__(self, host: str = 'localhost', port: int = 5001):
self.host = host
self.port = port
self.sock: Optional[socket.socket] = None
def connect(self) -> None:
"""Connect to the server."""
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sock.connect((self.host, self.port))
def send_message(self, message: Message) -> None:
"""Send a message."""
if self.sock:
data = json.dumps({
'msg_type': message.msg_type,
'msg_id': message.msg_id,
'data': message.data
}).encode('utf-8')
self.sock.sendall(data + b'\n')
def receive_message(self) -> Optional[Message]:
"""Receive a message."""
if self.sock:
try:
data = self.sock.recv(4096)
if data:
msg_data = json.loads(data.decode('utf-8').strip())
return Message(
msg_type=msg_data['msg_type'],
msg_id=msg_data['msg_id'],
data=msg_data['data']
)
except:
pass
return None
def close(self) -> None:
"""Close the connection."""
if self.sock:
self.sock.close()
class SocketServer:
"""Simple socket server for handling connections."""
def __init__(self, host: str = 'localhost', port: int = 5001):
self.host = host
self.port = port
self.server_sock: Optional[socket.socket] = None
self.running = False
self.message_handler = None
def start(self, message_handler) -> None:
"""Start the server."""
self.message_handler = message_handler
self.server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.server_sock.bind((self.host, self.port))
self.server_sock.listen(5)
self.running = True
threading.Thread(target=self._accept_loop, daemon=True).start()
def _accept_loop(self) -> None:
"""Accept incoming connections."""
while self.running:
try:
client_sock, addr = self.server_sock.accept()
threading.Thread(target=self._handle_client, args=(client_sock,), daemon=True).start()
except:
break
def _handle_client(self, client_sock: socket.socket) -> None:
"""Handle a client connection."""
try:
while self.running:
data = client_sock.recv(4096)
if not data:
break
messages = data.decode('utf-8').split('\n')
for msg_str in messages:
if msg_str.strip():
try:
msg_data = json.loads(msg_str)
message = Message(
msg_type=msg_data['msg_type'],
msg_id=msg_data['msg_id'],
data=msg_data['data']
)
response = self.message_handler(message)
if response:
resp_data = json.dumps({
'msg_type': response.msg_type,
'msg_id': response.msg_id,
'data': response.data
}).encode('utf-8')
client_sock.sendall(resp_data + b'\n')
except json.JSONDecodeError:
pass
except:
pass
finally:
client_sock.close()
def stop(self) -> None:
"""Stop the server."""
self.running = False
if self.server_sock:
self.server_sock.close()
\ No newline at end of file
# Video AI Configuration Module
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Configuration management for Video AI.
Handles backend selection for analysis and training.
Uses SQLite database for persistence.
"""
from .database import get_config, set_config, get_all_config, get_system_prompt, set_system_prompt
def get_analysis_backend() -> str:
"""Get the selected backend for analysis."""
return get_config('analysis_backend', 'cuda')
def get_training_backend() -> str:
"""Get the selected backend for training."""
return get_config('training_backend', 'cuda')
def set_analysis_backend(backend: str) -> None:
"""Set the backend for analysis."""
set_config('analysis_backend', backend)
def set_training_backend(backend: str) -> None:
"""Set the backend for training."""
set_config('training_backend', backend)
def get_default_model() -> str:
"""Get the default model path."""
return get_config('default_model', 'Qwen/Qwen2.5-VL-7B-Instruct')
def set_default_model(model: str) -> None:
"""Set the default model path."""
set_config('default_model', model)
def get_frame_interval() -> int:
"""Get the default frame interval."""
return int(get_config('frame_interval', '10'))
def set_frame_interval(interval: int) -> None:
"""Set the default frame interval."""
set_config('frame_interval', str(interval))
def get_system_prompt_content(name: str = 'default') -> str:
"""Get system prompt content."""
return get_system_prompt(name)
def set_system_prompt_content(name: str, content: str) -> None:
"""Set system prompt content."""
set_system_prompt(name, content)
def get_optimize() -> bool:
"""Get optimize setting."""
return get_config('optimize', 'false').lower() == 'true'
def set_optimize(optimize: bool) -> None:
"""Set optimize setting."""
set_config('optimize', 'true' if optimize else 'false')
def get_ffmpeg() -> bool:
"""Get ffmpeg setting."""
return get_config('ffmpeg', 'false').lower() == 'true'
def set_ffmpeg(ffmpeg: bool) -> None:
"""Set ffmpeg setting."""
set_config('ffmpeg', 'true' if ffmpeg else 'false')
def get_flash() -> bool:
"""Get flash setting."""
return get_config('flash', 'false').lower() == 'true'
def set_flash(flash: bool) -> None:
"""Set flash setting."""
set_config('flash', 'true' if flash else 'false')
def get_host() -> str:
"""Get host setting."""
return get_config('host', '0.0.0.0')
def set_host(host: str) -> None:
"""Set host setting."""
set_config('host', host)
def get_port() -> int:
"""Get port setting."""
return int(get_config('port', '5000'))
def set_port(port: int) -> None:
"""Set port setting."""
set_config('port', str(port))
def get_debug() -> bool:
"""Get debug setting."""
return get_config('debug', 'false').lower() == 'true'
def set_debug(debug: bool) -> None:
"""Set debug setting."""
set_config('debug', 'true' if debug else 'false')
def get_allowed_dir() -> str:
"""Get allowed directory."""
return get_config('allowed_dir', '')
def set_allowed_dir(dir_path: str) -> None:
"""Set allowed directory."""
set_config('allowed_dir', dir_path)
def get_all_settings() -> dict:
"""Get all configuration settings."""
config = get_all_config()
return {
'analysis_backend': config.get('analysis_backend', 'cuda'),
'training_backend': config.get('training_backend', 'cuda'),
'default_model': config.get('default_model', 'Qwen/Qwen2.5-VL-7B-Instruct'),
'frame_interval': int(config.get('frame_interval', '10')),
'optimize': config.get('optimize', 'false').lower() == 'true',
'ffmpeg': config.get('ffmpeg', 'false').lower() == 'true',
'flash': config.get('flash', 'false').lower() == 'true',
'host': config.get('host', '0.0.0.0'),
'port': int(config.get('port', '5000')),
'debug': config.get('debug', 'false').lower() == 'true',
'allowed_dir': config.get('allowed_dir', ''),
'system_prompt': get_system_prompt_content()
}
\ No newline at end of file
# Video AI Database Module
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Database management for Video AI.
Uses SQLite for persistent configuration storage.
"""
import sqlite3
import os
from typing import Dict, Any, Optional
DB_PATH = os.path.expanduser("~/.config/vidai/vidai.db")
def get_db_connection() -> sqlite3.Connection:
"""Get database connection, creating database if it doesn't exist."""
os.makedirs(os.path.dirname(DB_PATH), exist_ok=True)
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
init_db(conn)
return conn
def init_db(conn: sqlite3.Connection) -> None:
"""Initialize database tables if they don't exist."""
cursor = conn.cursor()
# Configuration table
cursor.execute('''
CREATE TABLE IF NOT EXISTS config (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
)
''')
# System prompts table
cursor.execute('''
CREATE TABLE IF NOT EXISTS system_prompts (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# Insert default configurations if not exist
defaults = {
'analysis_backend': 'cuda',
'training_backend': 'cuda',
'default_model': 'Qwen/Qwen2.5-VL-7B-Instruct',
'frame_interval': '10',
'optimize': 'false',
'ffmpeg': 'false',
'flash': 'false',
'host': '0.0.0.0',
'port': '5000',
'debug': 'false',
'allowed_dir': ''
}
for key, value in defaults.items():
cursor.execute('INSERT OR IGNORE INTO config (key, value) VALUES (?, ?)', (key, value))
# Insert default system prompt if not exist
cursor.execute('INSERT OR IGNORE INTO system_prompts (name, content) VALUES (?, ?)',
('default', 'when the action done by the person or persons in the frame changes, or where the scenario change, or where there an active action after a long time of no actions happening'))
conn.commit()
def get_config(key: str, default: str = '') -> str:
"""Get configuration value."""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute('SELECT value FROM config WHERE key = ?', (key,))
row = cursor.fetchone()
conn.close()
return row['value'] if row else default
def set_config(key: str, value: str) -> None:
"""Set configuration value."""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute('INSERT OR REPLACE INTO config (key, value) VALUES (?, ?)', (key, value))
conn.commit()
conn.close()
def get_all_config() -> Dict[str, str]:
"""Get all configuration as dictionary."""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute('SELECT key, value FROM config')
rows = cursor.fetchall()
conn.close()
return {row['key']: row['value'] for row in rows}
def get_system_prompt(name: str = 'default') -> str:
"""Get system prompt by name."""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute('SELECT content FROM system_prompts WHERE name = ?', (name,))
row = cursor.fetchone()
conn.close()
return row['content'] if row else ''
def set_system_prompt(name: str, content: str) -> None:
"""Set system prompt."""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute('''
INSERT OR REPLACE INTO system_prompts (name, content, updated_at)
VALUES (?, ?, CURRENT_TIMESTAMP)
''', (name, content))
conn.commit()
conn.close()
def get_all_system_prompts() -> Dict[str, Dict[str, Any]]:
"""Get all system prompts."""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute('SELECT id, name, content, created_at, updated_at FROM system_prompts')
rows = cursor.fetchall()
conn.close()
return {row['name']: dict(row) for row in rows}
\ No newline at end of file
This diff is collapsed.
# Video AI Analysis Worker
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Analysis worker process for Video AI.
Handles image/video analysis requests.
"""
import os
import sys
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import tempfile
import subprocess
import json
import cv2
import time
from .comm import SocketCommunicator, Message
from .config import get_system_prompt_content
# Set PyTorch CUDA memory management
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
# GPU delegation
gpu_mem = []
if torch.cuda.is_available():
for i in range(torch.cuda.device_count()):
gpu_mem.append(torch.cuda.get_device_properties(i).total_memory)
max_gpu = gpu_mem.index(max(gpu_mem)) if gpu_mem else 0
min_gpu = gpu_mem.index(min(gpu_mem)) if gpu_mem else 0
else:
max_gpu = min_gpu = 0
# Set OpenCV to smaller GPU if available
try:
if cv2 and hasattr(cv2, 'cuda'):
cv2.cuda.setDevice(min_gpu)
except:
pass
def extract_frames(video_path, interval=10, optimize=False):
if cv2:
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
frame_interval = int(fps * interval)
frames = []
count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
if count % frame_interval == 0:
if optimize:
height, width = frame.shape[:2]
new_width = 640
new_height = int(height * new_width / width)
frame = cv2.resize(frame, (new_width, new_height))
temp_img = tempfile.NamedTemporaryFile(delete=False, suffix=".jpg")
cv2.imwrite(temp_img.name, frame)
frames.append((temp_img.name, count / fps))
count += 1
cap.release()
return frames, None
else:
output_dir = tempfile.mkdtemp()
vf = f"fps=1/{interval}"
if optimize:
vf += ",scale=640:-1"
cmd = ["ffmpeg", "-i", video_path, "-vf", vf, os.path.join(output_dir, "frame_%04d.jpg")]
subprocess.run(cmd, check=True, capture_output=True)
frames = []
for file in sorted(os.listdir(output_dir)):
if file.endswith('.jpg'):
path = os.path.join(output_dir, file)
frame_num = int(file.split('_')[1].split('.')[0])
ts = (frame_num - 1) * interval
frames.append((path, ts))
return frames, output_dir
def is_video(file_path):
return file_path.lower().endswith(('.mp4', '.avi', '.mov', '.mkv'))
def analyze_single_image(image_path, prompt, model, processor):
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": prompt},
],
}
]
inputs = processor.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt"
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs['input_ids'], generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
return output_text[0]
def analyze_media(media_path, prompt, model_path, interval=10):
torch.cuda.empty_cache()
if model_path not in model_cache:
kwargs = {"device_map": "auto", "low_cpu_mem_usage": True}
if os.path.exists(model_path):
try:
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model_path, **kwargs)
proc_path = model_path
except:
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", **kwargs)
proc_path = "Qwen/Qwen2.5-VL-7B-Instruct"
else:
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", **kwargs)
proc_path = "Qwen/Qwen2.5-VL-7B-Instruct"
model_cache[model_path] = model
processor_cache[model_path] = AutoProcessor.from_pretrained(proc_path)
else:
model = model_cache[model_path]
proc_path = model_path if os.path.exists(model_path) else "Qwen/Qwen2.5-VL-7B-Instruct"
processor = processor_cache[model_path]
system_prompt = get_system_prompt_content()
full_prompt = system_prompt + " " + prompt if system_prompt else prompt
if is_video(media_path):
frames, output_dir = extract_frames(media_path, interval, optimize=True)
total_frames = len(frames)
descriptions = []
for i, (frame_path, ts) in enumerate(frames):
desc = analyze_single_image(frame_path, full_prompt, model, processor)
descriptions.append(f"At {ts:.2f}s: {desc}")
os.unlink(frame_path)
if output_dir:
import shutil
shutil.rmtree(output_dir)
summary_prompt = f"Summarize the video based on frame descriptions: {' '.join(descriptions)}"
messages = [{"role": "user", "content": [{"type": "text", "text": summary_prompt}]}]
inputs = processor.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs['input_ids'], generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
summary = output_text[0]
result = f"Frame Descriptions:\n" + "\n".join(descriptions) + f"\n\nSummary:\n{summary}"
return result
else:
result = analyze_single_image(media_path, full_prompt, model, processor)
torch.cuda.empty_cache()
return result
model_cache = {}
processor_cache = {}
def worker_process(backend_type: str):
"""Main worker process."""
print(f"Starting Analysis Worker for {backend_type}...")
comm = SocketCommunicator(port=5002)
comm.connect()
# Register with backend
register_msg = Message('register', 'register', {'type': f'analysis_{backend_type}'})
comm.send_message(register_msg)
while True:
try:
message = comm.receive_message()
if message and message.msg_type == 'analyze_request':
data = message.data
media_path = data.get('local_path', data.get('file_name', ''))
if not media_path:
result = 'No media path provided'
else:
prompt = data.get('prompt', 'Describe this image.')
model_path = data.get('model_path', 'Qwen/Qwen2.5-VL-7B-Instruct')
interval = data.get('interval', 10)
result = analyze_media(media_path, prompt, model_path, interval)
response = Message('analyze_response', message.msg_id, {'result': result})
comm.send_message(response)
time.sleep(0.1)
except Exception as e:
print(f"Worker error: {e}")
time.sleep(1)
if __name__ == "__main__":
backend_type = sys.argv[1] if len(sys.argv) > 1 else 'cuda'
worker_process(backend_type)
\ No newline at end of file
# Video AI Training Worker
# Copyright (C) 2024 Stefy Lanza <stefy@sexhack.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
Training worker process for Video AI.
Handles model training requests.
"""
import os
import sys
import subprocess
import tempfile
import shutil
import json
import time
from .comm import SocketCommunicator, Message
def train_model(train_path, output_model, description):
"""Perform training."""
desc_file = os.path.join(train_path, "description.txt")
with open(desc_file, "w") as f:
f.write(description)
# Assume videotrain is available
cmd = ["python", "videotrain", train_path, "--output_dir", output_model]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode == 0:
return "Training completed!"
else:
return f"Training failed: {result.stderr}"
def worker_process(backend_type: str):
"""Main worker process."""
print(f"Starting Training Worker for {backend_type}...")
comm = SocketCommunicator(port=5002)
comm.connect()
# Register with backend
register_msg = Message('register', 'register', {'type': f'training_{backend_type}'})
comm.send_message(register_msg)
while True:
try:
message = comm.receive_message()
if message and message.msg_type == 'train_request':
data = message.data
output_model = data.get('output_model', './VideoModel')
description = data.get('description', '')
train_dir = data.get('train_dir', '')
if train_dir and os.path.isdir(train_dir):
result = train_model(train_dir, output_model, description)
else:
result = "No valid training directory provided"
response = Message('train_response', message.msg_id, {'message': result})
comm.send_message(response)
time.sleep(0.1)
except Exception as e:
print(f"Worker error: {e}")
time.sleep(1)
if __name__ == "__main__":
backend_type = sys.argv[1] if len(sys.argv) > 1 else 'cuda'
worker_process(backend_type)
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment