Skip to content

Database Documentation

This document provides comprehensive documentation for the database setup, migrations, and management in the Yappa Knowledge Hub backend.

Overview

The backend uses SQLite as the local database with Doctrine ORM for object-relational mapping and Doctrine Migrations for schema management.

Why SQLite?

  • Simplicity: No separate database server required
  • Portability: Single file database
  • Performance: Fast for read-heavy workloads
  • Zero Configuration: Works out of the box
  • Perfect for Development: Easy to reset and seed

Database Location

backend/var/data.db

The database file is created automatically when you run migrations.


Database Configuration

Environment Variables

Configure the database in .env or .env.local:

bash
# SQLite database
DATABASE_URL="sqlite:///%kernel.project_dir%/var/data.db"

# Alternative: Absolute path
DATABASE_URL="sqlite:///https://github.com/undead2146/KnowledgeHub/blob/main/backend/var/data.db"

Doctrine Configuration

Located in config/packages/doctrine.yaml:

yaml
doctrine:
    dbal:
        url: '%env(resolve:DATABASE_URL)%'

    orm:
        auto_generate_proxy_classes: true
        enable_lazy_ghost_objects: true
        report_fields_where_declared: true
        validate_xml_mapping: true
        naming_strategy: doctrine.orm.naming_strategy.underscore_number_aware
        auto_mapping: true
        mappings:
            App:
                type: attribute
                is_bundle: false
                dir: '%kernel.project_dir%/src/Entity'
                prefix: 'App\Entity'
                alias: App

Database Schema

Tables

The database consists of 8 tables:

  1. category - Knowledge categories
  2. knowledge - Main knowledge items
  3. resource - External resources
  4. resource_content - Extracted content
  5. summary - AI summaries
  6. tag - Tags
  7. resource_tag - Many-to-many join table
  8. resource_list - Many-to-many join table

Schema Diagram

See the ER diagram in entities.md for visual representation.

Detailed Schema

category

sql
CREATE TABLE category (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    name VARCHAR(255) NOT NULL,
    description VARCHAR(255) DEFAULT NULL,
    icon VARCHAR(50) DEFAULT ':file_folder:',
    target_groups TEXT DEFAULT NULL,  -- JSON array
    notion_id VARCHAR(255) DEFAULT NULL,
    last_synced_at DATETIME DEFAULT NULL,
    notion_last_edited_at DATETIME DEFAULT NULL,
    created_at DATETIME DEFAULT NULL,
    is_active BOOLEAN DEFAULT 1,
    sort_order INTEGER DEFAULT 0
);

CREATE INDEX IDX_category_notion_id ON category(notion_id);

knowledge

sql
CREATE TABLE knowledge (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    title VARCHAR(255) NOT NULL,
    content TEXT NOT NULL,
    category_id INTEGER DEFAULT NULL,
    tags TEXT DEFAULT NULL,  -- JSON array
    target_groups TEXT DEFAULT NULL,  -- JSON array
    user_id VARCHAR(255) DEFAULT NULL,
    status VARCHAR(50) DEFAULT 'pending',
    url VARCHAR(500) DEFAULT NULL,
    url_metadata TEXT DEFAULT NULL,  -- JSON object
    source_message TEXT DEFAULT NULL,  -- JSON object
    attachments TEXT DEFAULT NULL,  -- JSON array
    summaries TEXT DEFAULT NULL,  -- JSON array
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL,
    notion_id VARCHAR(255) DEFAULT NULL,
    notion_url VARCHAR(500) DEFAULT NULL,
    last_synced_at DATETIME DEFAULT NULL,
    notion_last_edited_at DATETIME DEFAULT NULL,
    CONSTRAINT FK_knowledge_category
        FOREIGN KEY (category_id)
        REFERENCES category (id)
        ON DELETE SET NULL
);

CREATE INDEX IDX_knowledge_category ON knowledge(category_id);
CREATE INDEX IDX_knowledge_notion_id ON knowledge(notion_id);
CREATE INDEX IDX_knowledge_status ON knowledge(status);
CREATE INDEX IDX_knowledge_user_id ON knowledge(user_id);

resource

sql
CREATE TABLE resource (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    title VARCHAR(255) NOT NULL,
    url VARCHAR(500) DEFAULT NULL,
    source_type VARCHAR(20) NOT NULL,  -- enum: url, text, pdf, rss, youtube, podcast
    status VARCHAR(20) NOT NULL,  -- enum: pending, processing, done, failed
    notion_id VARCHAR(255) DEFAULT NULL,
    last_synced_at DATETIME DEFAULT NULL,
    notion_last_edited_at DATETIME DEFAULT NULL,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL
);

CREATE INDEX IDX_resource_notion_id ON resource(notion_id);
CREATE INDEX IDX_resource_status ON resource(status);
CREATE INDEX IDX_resource_source_type ON resource(source_type);

resource_content

sql
CREATE TABLE resource_content (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    content TEXT NOT NULL,
    content_type VARCHAR(50) DEFAULT 'text/plain',
    resource_id INTEGER NOT NULL,
    created_at DATETIME NOT NULL,
    CONSTRAINT FK_resource_content_resource
        FOREIGN KEY (resource_id)
        REFERENCES resource (id)
        ON DELETE CASCADE
);

CREATE INDEX IDX_resource_content_resource ON resource_content(resource_id);

summary

sql
CREATE TABLE summary (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    content TEXT NOT NULL,
    model VARCHAR(50) DEFAULT NULL,
    resource_id INTEGER NOT NULL,
    created_at DATETIME NOT NULL,
    CONSTRAINT FK_summary_resource
        FOREIGN KEY (resource_id)
        REFERENCES resource (id)
        ON DELETE CASCADE
);

CREATE INDEX IDX_summary_resource ON summary(resource_id);

tag

sql
CREATE TABLE tag (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    name VARCHAR(255) NOT NULL UNIQUE
);

CREATE UNIQUE INDEX UNIQ_tag_name ON tag(name);

resource_tag (join table)

sql
CREATE TABLE resource_tag (
    resource_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    PRIMARY KEY (resource_id, tag_id),
    CONSTRAINT FK_resource_tag_resource
        FOREIGN KEY (resource_id)
        REFERENCES resource (id)
        ON DELETE CASCADE,
    CONSTRAINT FK_resource_tag_tag
        FOREIGN KEY (tag_id)
        REFERENCES tag (id)
        ON DELETE CASCADE
);

CREATE INDEX IDX_resource_tag_resource ON resource_tag(resource_id);
CREATE INDEX IDX_resource_tag_tag ON resource_tag(tag_id);

resource_list (join table)

sql
CREATE TABLE resource_list (
    resource_id INTEGER NOT NULL,
    category_id INTEGER NOT NULL,
    PRIMARY KEY (resource_id, category_id),
    CONSTRAINT FK_resource_list_resource
        FOREIGN KEY (resource_id)
        REFERENCES resource (id)
        ON DELETE CASCADE,
    CONSTRAINT FK_resource_list_category
        FOREIGN KEY (category_id)
        REFERENCES category (id)
        ON DELETE CASCADE
);

CREATE INDEX IDX_resource_list_resource ON resource_list(resource_id);
CREATE INDEX IDX_resource_list_category ON resource_list(category_id);

Migrations

Doctrine Migrations manages database schema changes through versioned migration files.

Migration Files

Located in backend/migrations/:

  • Version20260211153507.php - Initial schema
  • Version20260211155807.php - Schema updates

Creating Migrations

Generate Migration from Entity Changes

When you modify entities, generate a migration:

bash
cd backend
php bin/console make:migration

This analyzes your entities and generates a migration file with the necessary SQL changes.

Review the Migration

Always review the generated migration before running it:

php
// migrations/VersionXXXXXXXXXXXXXX.php
public function up(Schema $schema): void
{
    // Review these SQL statements
    $this->addSql('ALTER TABLE knowledge ADD COLUMN new_field VARCHAR(255)');
}

Running Migrations

Execute All Pending Migrations

bash
cd backend
php bin/console doctrine:migrations:migrate

This runs all migrations that haven't been executed yet.

Execute Specific Migration

bash
php bin/console doctrine:migrations:execute --up VersionXXXXXXXXXXXXXX

Dry Run (Preview SQL)

bash
php bin/console doctrine:migrations:migrate --dry-run

Migration Status

Check which migrations have been executed:

bash
php bin/console doctrine:migrations:status

Output:

 == Configuration

    >> Name:                                               Application Migrations
    >> Database Driver:                                    pdo_sqlite
    >> Database Host:                                      localhost
    >> Database Name:                                      data.db
    >> Configuration Source:                               manually configured
    >> Version Table Name:                                 doctrine_migration_versions
    >> Version Column Name:                                version
    >> Migrations Namespace:                               DoctrineMigrations
    >> Migrations Directory:                               /path/to/migrations
    >> Previous Version:                                   Already at first version
    >> Current Version:                                    2026-02-11 15:58:07 (Version20260211155807)
    >> Next Version:                                       Already at latest version
    >> Latest Version:                                     2026-02-11 15:58:07 (Version20260211155807)
    >> Executed Migrations:                                2
    >> Executed Unavailable Migrations:                    0
    >> Available Migrations:                               2
    >> New Migrations:                                     0

Rolling Back Migrations

Rollback to Previous Version

bash
php bin/console doctrine:migrations:migrate prev

Rollback to Specific Version

bash
php bin/console doctrine:migrations:migrate VersionXXXXXXXXXXXXXX

Rollback All Migrations

bash
php bin/console doctrine:migrations:migrate first

Database Management

Creating the Database

Create the database file:

bash
cd backend
php bin/console doctrine:database:create

This creates var/data.db if it doesn't exist.

Dropping the Database

Delete the database file:

bash
php bin/console doctrine:database:drop --force

Warning: This deletes all data!

Validating Schema

Check if the database schema matches your entities:

bash
php bin/console doctrine:schema:validate

Output:

[Mapping]  OK - The mapping files are correct.
[Database] OK - The database schema is in sync with the mapping files.

Updating Schema (Development Only)

For quick development, update schema without migrations:

bash
php bin/console doctrine:schema:update --force

Warning: Only use in development! Use migrations in production.


Seeding Data

Using Data Fixtures

Data fixtures provide sample data for development and testing.

Create Fixture

php
// src/DataFixtures/DemoDataFixtures.php
namespace App\DataFixtures;

use App\Entity\Category;
use App\Entity\Knowledge;
use Doctrine\Bundle\FixturesBundle\Fixture;
use Doctrine\Persistence\ObjectManager;

class DemoDataFixtures extends Fixture
{
    public function load(ObjectManager $manager): void
    {
        // Create categories
        $category = new Category();
        $category->setName('Development')
            ->setDescription('Development resources')
            ->setIcon('')
            ->setTargetGroups(['developers'])
            ->setIsActive(true)
            ->setSortOrder(1);
        $manager->persist($category);

        // Create knowledge items
        for ($i = 1; $i <= 10; $i++) {
            $knowledge = new Knowledge();
            $knowledge->setTitle("Knowledge Item $i")
                ->setContent("This is the content for knowledge item $i")
                ->setCategory($category)
                ->setTags(['tag1', 'tag2'])
                ->setTargetGroups(['developers'])
                ->setStatus('published');
            $manager->persist($knowledge);
        }

        $manager->flush();
    }
}

Load Fixtures

bash
cd backend
php bin/console doctrine:fixtures:load

Warning: This purges the database before loading!

Append Fixtures (Don't Purge)

bash
php bin/console doctrine:fixtures:load --append

Using Console Commands

The backend includes custom commands for seeding:

PopulateKnowledgeCommand

bash
php bin/console app:populate-knowledge

Populates knowledge items from a predefined source.

SeedPocDataCommand

bash
php bin/console app:seed-poc-data

Seeds proof-of-concept data for testing.


Querying the Database

Using Doctrine ORM

php
// Get entity manager
$entityManager = $this->getDoctrine()->getManager();

// Get repository
$repository = $entityManager->getRepository(Knowledge::class);

// Find by ID
$knowledge = $repository->find(1);

// Find one by criteria
$knowledge = $repository->findOneBy(['title' => 'My Title']);

// Find all
$items = $repository->findAll();

// Find with criteria and sorting
$items = $repository->findBy(
    ['status' => 'published'],
    ['createdAt' => 'DESC'],
    10,  // limit
    0    // offset
);

// Custom query with QueryBuilder
$qb = $repository->createQueryBuilder('k')
    ->where('k.title LIKE :search')
    ->andWhere('k.status = :status')
    ->setParameter('search', '%symfony%')
    ->setParameter('status', 'published')
    ->orderBy('k.createdAt', 'DESC')
    ->setMaxResults(10);

$results = $qb->getQuery()->getResult();

Using DQL (Doctrine Query Language)

php
$dql = 'SELECT k FROM App\Entity\Knowledge k
        WHERE k.title LIKE :search
        ORDER BY k.createdAt DESC';

$query = $entityManager->createQuery($dql)
    ->setParameter('search', '%symfony%')
    ->setMaxResults(10);

$results = $query->getResult();

Using Native SQL

php
$conn = $entityManager->getConnection();

$sql = 'SELECT * FROM knowledge WHERE title LIKE :search';
$stmt = $conn->prepare($sql);
$stmt->bindValue('search', '%symfony%');
$resultSet = $stmt->executeQuery();

$results = $resultSet->fetchAllAssociative();

Direct SQLite Access

You can also query the database directly:

bash
cd backend/var
sqlite3 data.db

# List tables
.tables

# Describe table
.schema knowledge

# Query
SELECT * FROM knowledge LIMIT 10;

# Exit
.quit

Backup and Restore

Backup Database

Since SQLite is a single file, backup is simple:

bash
# Copy the database file
cp backend/var/data.db backend/var/data.db.backup

# Or with timestamp
cp backend/var/data.db backend/var/data.db.$(date +%Y%m%d_%H%M%S)

Restore Database

bash
# Restore from backup
cp backend/var/data.db.backup backend/var/data.db

Export to SQL

bash
cd backend/var
sqlite3 data.db .dump > backup.sql

Import from SQL

bash
cd backend/var
sqlite3 data.db < backup.sql

Performance Optimization

Indexes

The schema includes indexes on frequently queried columns:

  • notion_id columns (for sync operations)
  • category_id (for joins)
  • status (for filtering)
  • user_id (for user queries)

Query Optimization

  1. Use Indexes: Ensure WHERE clauses use indexed columns
  2. Limit Results: Always use LIMIT for large datasets
  3. Lazy Loading: Doctrine lazy-loads relationships by default
  4. Eager Loading: Use JOIN FETCH for related entities you need
php
// Eager load category with knowledge items
$qb = $repository->createQueryBuilder('k')
    ->leftJoin('k.category', 'c')
    ->addSelect('c')
    ->where('k.status = :status')
    ->setParameter('status', 'published');
  1. Pagination: Use Doctrine Paginator for large result sets
php
use Doctrine\ORM\Tools\Pagination\Paginator;

$query = $repository->createQueryBuilder('k')
    ->setFirstResult(0)
    ->setMaxResults(20)
    ->getQuery();

$paginator = new Paginator($query);

Database Maintenance

SQLite doesn't require much maintenance, but you can:

Vacuum (Reclaim Space)

bash
sqlite3 backend/var/data.db "VACUUM;"

Analyze (Update Statistics)

bash
sqlite3 backend/var/data.db "ANALYZE;"

Troubleshooting

Database Locked Error

Problem: database is locked error

Solution:

  • Close all connections to the database
  • Check for long-running transactions
  • Increase timeout in configuration
yaml
# config/packages/doctrine.yaml
doctrine:
    dbal:
        options:
            timeout: 30

Schema Out of Sync

Problem: Schema doesn't match entities

Solution:

bash
# Check differences
php bin/console doctrine:schema:update --dump-sql

# Generate migration
php bin/console make:migration

# Run migration
php bin/console doctrine:migrations:migrate

Migration Failed

Problem: Migration execution failed

Solution:

bash
# Check migration status
php bin/console doctrine:migrations:status

# Mark migration as executed (if already applied manually)
php bin/console doctrine:migrations:version VersionXXXXXXXXXXXXXX --add

# Or rollback and retry
php bin/console doctrine:migrations:migrate prev
php bin/console doctrine:migrations:migrate

Database Corruption

Problem: Database file is corrupted

Solution:

bash
# Check integrity
sqlite3 backend/var/data.db "PRAGMA integrity_check;"

# If corrupted, restore from backup
cp backend/var/data.db.backup backend/var/data.db

# Or recreate from migrations
rm backend/var/data.db
php bin/console doctrine:database:create
php bin/console doctrine:migrations:migrate
php bin/console doctrine:fixtures:load

Best Practices

  1. Always Use Migrations: Never modify schema directly
  2. Review Generated Migrations: Check SQL before running
  3. Backup Before Migrations: Backup database before running migrations in production
  4. Use Transactions: Wrap multiple operations in transactions
  5. Validate Schema: Run doctrine:schema:validate regularly
  6. Use Fixtures for Testing: Create fixtures for consistent test data
  7. Index Frequently Queried Columns: Add indexes for performance
  8. Use Query Builder: Prefer QueryBuilder over raw SQL
  9. Lazy Load Relationships: Don't eager load unless needed
  10. Monitor Query Performance: Log slow queries in production

Production Considerations

Database Location

In production, consider storing the database outside the web root:

bash
DATABASE_URL="sqlite:////var/lib/yappa-knowledge-hub/data.db"

Permissions

Ensure proper file permissions:

bash
chmod 664 /var/lib/yappa-knowledge-hub/data.db
chown www-data:www-data /var/lib/yappa-knowledge-hub/data.db

Backup Strategy

Implement automated backups:

bash
# Cron job for daily backups
0 2 * * * cp /var/lib/yappa-knowledge-hub/data.db /backups/data.db.$(date +\%Y\%m\%d)

Monitoring

Monitor database size and performance:

bash
# Check database size
ls -lh backend/var/data.db

# Check table sizes
sqlite3 backend/var/data.db "SELECT name, SUM(pgsize) as size FROM dbstat GROUP BY name ORDER BY size DESC;"