Yappa Knowledge Hub - Consolidated Product Requirements Document
Version: 4.0 (Consolidated) Date: 2026-02-20 Status: Sprint 0 Complete | MVP Planning Phase Owner: Yappa Internship Project
Executive Summary
The Yappa Knowledge Hub is an internal knowledge management system that enables Yappa employees to capture, organize, and distribute knowledge through Slack, with AI-powered summaries and automated digests. The system uses Notion as the single source of truth, Symfony as the processing layer, and Slack as the primary user interface.
Sprint 0 (POC) Achievement: Successfully demonstrated end-to-end integration between Slack, Symfony, and Notion with 19 knowledge items stored across 10 categories. The system validates the technical approach and is ready for team review before merging to main branch.
MVP Path: Requires 80-100 additional tickets (400-500 story points) to reach production-ready status with AI summaries, digest automation, and production infrastructure.
1. Product Vision
What is Yappa Knowledge Hub
A central system that makes internal knowledge:
- Easy to capture - Low friction submission via Slack
- Smartly processed - AI summaries tailored per target audience
- Actively distributed - Periodic digests by thematic list and target group
- Centrally accessible - Notion as single source of truth with web UI
Target Users
- Contributors (Yappa employees): Submit content, select lists, target groups, and tags
- List Owners/Admins: Create/edit lists, configure digest schedules, manage prompts
- Digest Recipients: Receive scheduled digests via Slack DM/channel
Target Groups (Audience Segments)
Semantic roles used for tone and content focus:
- Developers
- Marketers
- CEO/Leadership
- Service Desk
- Sales
- Operations
Core Value Proposition
Problem: Internal knowledge at Yappa is scattered across Slack messages, emails, notes, and announcements. Information is shared "in the moment" but becomes hard to find, contextualize, and distribute.
Solution: A Slack-first knowledge hub that captures information with minimal friction, processes it with AI for different audiences, and distributes it through automated digests.
2. Sprint 0 (POC) Achievements
What Was Built (Honest Assessment)
Completion Date: 2026-02-16 Status: Functional prototype validating technical approach
Working Infrastructure
- Slack bot connected via Socket Mode (Node.js + @slack/bolt)
- Symfony 7.2 backend with REST API (PHP 8.2)
- Notion databases (3) created and accessible
- Basic CRUD operations for knowledge and categories
- Sub-second SlackNotion sync (< 1 second)
- Cron-based NotionSlack sync (2 minutes)
- Dutch UI strings (basic localization)
- 19 knowledge items successfully stored
- 10 categories with icons and descriptions
Slack Bot Features
- Message shortcuts ("Save to Knowledge Hub")
- Global shortcuts ("Quick Add Knowledge")
- Slash command
/knowledgewith subcommands (add, search, dashboard, help) - Modal forms for knowledge submission
- App Home tab with stats and category browser
- URL detection with basic metadata scraping
- File share detection (detection only, no processing)
- Emoji reaction trigger (basic implementation)
- Dutch confirmation messages
Symfony Backend
- REST API endpoints for Knowledge and Categories
- NotionClient with basic retry logic
- NotionKnowledgeService for CRUD operations
- NotionCategoryService for category management
- NotionPropertyMapper for data conversion
- NotionSyncService for bidirectional sync
- Basic error handling and logging
Notion Integration
- Knowledge Database (ID: 306e292a15d58004a8cbc222dcd48bb2)
- Categories Database (ID: 306e292a15d5805dae13e64bed8519c5)
- Digests Database (ID: 306e292a15d580d7a0f6fe8421baff10)
- Manual sync endpoint operational
- Direct web access for manual editing
What Was NOT Built
Missing Core Features:
- AI summary generation (stub only, no OpenAI integration)
- Digest scheduling and delivery (mock formatter only)
- Real-time webhook sync (uses 2-minute polling)
- Persistent state management (in-memory tracking lost on restart)
- Production infrastructure (no queues, retry logic, monitoring)
- User authentication and permissions
- Analytics and reporting
- Web dashboard
- Advanced search and filtering
Missing Extra Storys:
- Proper URL extraction (basic scraping only)
- PDF upload support (detection only)
- Tag normalization and autocomplete
- URL validation and sanitization
- List suggestions and multi-list assignment
- Rate limiting for APIs
- Queue system for async processing
- Health monitoring endpoints
30 Tickets Completed
The Sprint 0 work represents approximately 30 completed tickets covering:
- Slack bot foundation (8 tickets)
- Symfony backend setup (6 tickets)
- Notion integration (5 tickets)
- Basic CRUD operations (6 tickets)
- Sync system (3 tickets)
- Dutch localization (2 tickets)
Total Story Points Completed: ~150 SP
Architecture Decisions Made
- Notion as Primary Database: All data stored in Notion databases, no local database required
- Slack-First UX: Primary interface through Slack bot, web dashboard secondary
- Symfony REST API: Backend processing layer between Slack and Notion
- Socket Mode for Development: Simple connection model for POC, HTTP mode for production
- Dutch Localization: All user-facing strings in Dutch
- Target Group Model: Audience-based content tailoring for summaries and digests
3. Sprint 0 Status
Completed Features
Ready for Main Branch Merge:
- Slack bot with multiple interaction patterns (shortcuts, commands, modals)
- Functional Symfony REST API with Notion integration
- Bidirectional sync (SlackNotion instant, NotionSlack 2-min polling)
- 19 knowledge items successfully stored
- 10 categories with icons and descriptions
- Dutch localization for user-facing strings
- Basic URL detection and metadata scraping
Technical Quality:
- Code follows Symfony and Node.js best practices
- Basic error handling implemented
- Logging configured
- Documentation complete
- All systems operational
Pending Team Review
Before Merging to Main:
- Code review by senior developers
- Security review of API endpoints
- Dutch translation review by native speaker
- UX review of Slack modals
- Performance testing with concurrent users
- Decision on production deployment strategy
Known Limitations
Technical Debt:
- In-memory view tracking (lost on restart)
- No rate limiting (API quota risk)
- No retry logic (failed requests lost)
- Socket Mode only (not production-ready)
- Basic error handling (needs enhancement)
- Cron-based sync (not real-time)
Scope Limitations:
- Single category per knowledge item
- Hardcoded target groups (6 groups)
- No AI functionality
- No digest system
- No user management
- No analytics
4. Product Roadmap
Sprint 1-2: Infrastructure Hardening (Weeks 1-4)
Objective: Make POC production-ready
Deliverables:
- Redis integration for persistent state
- Queue system (Symfony Messenger)
- Retry logic with exponential backoff
- Rate limiting for Notion API
- Health check endpoints
- Structured logging with context
Story Points: 80 SP Priority: Critical (P0)
Acceptance Criteria:
- System survives restart without data loss
- Failed operations retry automatically
- API rate limits respected
- Health endpoint returns 200 OK
- Logs include request IDs and context
Sprint 3-4: Content Ingestion Extra Story (Weeks 3-6)
Objective: Improve content capture quality
Deliverables:
- Full URL extraction service
- Tag normalization and autocomplete
- Enhanced emoji reaction processing
- URL validation and sanitization
- Multi-list assignment support
Story Points: 92 SP Priority: High (P1)
Acceptance Criteria:
- URLs extract full article content
- Tags suggest existing options
- Invalid URLs rejected with clear errors
- Knowledge items can belong to multiple lists
Sprint 5-6: AI Integration (Weeks 5-9)
Objective: Enable AI-powered summaries
Deliverables:
- OpenAI API integration
- Target-group-specific prompt templates
- Async summary generation
- Summary regeneration UI
- Token usage tracking
Story Points: 99 SP Priority: Critical (P0)
Acceptance Criteria:
- Summaries generated within 30 seconds
- Each target group gets unique summary
- Users can regenerate summaries
- Token costs tracked per summary
Sprint 7-8: Digest Automation (Weeks 8-11)
Objective: Automated knowledge distribution
Deliverables:
- Digest scheduling per category
- Digest generation with summaries
- Slack channel/DM delivery
- Digest history in Notion
Story Points: 68 SP Priority: Critical (P0)
Acceptance Criteria:
- Digests sent on schedule (weekly/biweekly/monthly)
- Digests include AI summaries
- Delivery success rate > 99%
- Users can view digest history
Sprint 9-10: Search & Discovery (Weeks 10-13)
Objective: Improve discoverability
Deliverables:
- Advanced search with filters
- Date range filtering
- Tag-based filtering
- Status filtering
- Search result pagination
Story Points: 87 SP Priority: High (P1)
Acceptance Criteria:
- Search returns relevant results
- Filters work correctly
- Results paginate properly
- Search performance < 500ms
Sprint 11-12: User Management (Weeks 12-15)
Objective: Add access control
Deliverables:
- Role-based permissions
- User preferences
- Admin interface
- Permission system
Story Points: 95 SP Priority: Medium (P2)
Acceptance Criteria:
- Users have roles (admin, contributor, viewer)
- Permissions enforced on all endpoints
- Users can set preferences
- Admins can manage users
5. Backlog
Extra Story Tickets (201 Total)
Created Tickets:
- Batch 1 Part 1: Content Ingestion Foundation (US-100 to US-115)
- Batch 1 Part 2: Content Submission Features (US-116 to US-130)
- Batch 3: AI Summaries (US-166 to US-200)
Pending Tickets:
- Batch 2: Lists & Categories (US-131 to US-165)
- Batch 4: Digest & Infrastructure (US-201 to US-235)
- Batch 5: Search & User Management (US-236 to US-280)
- Batch 6: Advanced & Analytics (US-281 to US-320)
Total Extra Story Scope: Tracked dynamically in Jira workflow dashboard.
Priority Ordering
Tier 1: Foundation (Must Have) - 35 tickets | 180 SP
- Infrastructure hardening (Redis, queues, retry, rate limiting, health checks)
- Content ingestion extra storys (URL extraction, tag normalization, validation)
- Category management extra storys (list suggestions, multi-list, filtering)
Tier 2: Core Features (High Priority) - 45 tickets | 220 SP
- AI summaries (OpenAI integration, target-group prompts, regeneration)
- Digest system (scheduling, generation, Slack delivery)
- Search & discovery (advanced search, filtering)
Tier 3: Polish (Nice to Have) - 20 tickets | 100 SP
- User management (role-based permissions, preferences)
- Analytics (usage tracking, dashboard)
Story Point Estimates
By Epic:
- Epic 1: Content Ingestion - 92 SP
- Epic 2: Lists & Categories - 81 SP
- Epic 3: AI Summaries - 99 SP
- Epic 4: Digest System - 68 SP
- Epic 5: Infrastructure - 83 SP
- Epic 6: Notion Sync - 47 SP
- Epic 7: Search & Discovery - 87 SP
- Epic 8: User Management - 95 SP
- Epic 9: Analytics - 60 SP
- Epic 10: Web Dashboard - 71 SP
Total: 783 SP
By Priority:
- Critical (P0): 234 SP (29.9%)
- High (P1): 163 SP (20.8%)
- Medium (P2): 186 SP (23.8%)
- Low (P3): 200 SP (25.5%)
6. Technical Requirements
Architecture
Slack Bot Symfony Notion
(Node.js) Backend Databases
OpenAI API
(Summaries)
Redis
(Cache/State)Technology Stack
Input Layer:
- Slack Bot (Node.js 18+, @slack/bolt framework)
- Socket Mode (development) / HTTP Mode (production)
Processing Layer:
- Symfony 7.2 (PHP 8.2)
- Doctrine ORM
- Symfony Messenger (queues)
Storage Layer:
- Notion Databases (primary storage)
- Redis (cache and state management)
- SQLite (local development backup)
AI Layer:
- OpenAI GPT-4 (target-group-specific summaries)
- Configurable AI provider interface
Infrastructure:
- Redis 7+ (caching, view tracking, session management)
- Symfony Messenger (async job processing)
- Monolog (structured logging)
Infrastructure Needs
Development Environment:
- Node.js 18+
- PHP 8.2+
- Composer
- npm/yarn
- Redis (optional for POC)
Production Environment:
- VPS or cloud hosting (AWS/DigitalOcean/Hetzner)
- Redis instance (persistent)
- Process manager (PM2 for Node.js, Supervisor for Symfony)
- Reverse proxy (nginx/Apache)
- SSL certificates
- Monitoring (Prometheus/Grafana recommended)
External Services:
- Slack workspace with bot app installed
- Notion workspace with API access
- OpenAI API account (for MVP)
- Domain name and DNS configuration
API Endpoints
Knowledge Management:
POST /api/knowledge Create knowledge item
GET /api/knowledge List knowledge items
GET /api/knowledge/{id} Get single item
PUT /api/knowledge/{id} Update item
DELETE /api/knowledge/{id} Delete item
POST /api/knowledge/search Search knowledgeCategory Management:
POST /api/categories Create category
GET /api/categories List categories
GET /api/categories/{id} Get category
PUT /api/categories/{id} Update category
DELETE /api/categories/{id} Delete categorySync Operations:
POST /api/notion/sync/bidirectional Full bidirectional sync
POST /api/notion/sync/from-notion Sync from Notion to local
POST /api/notion/sync/categories Sync categories
GET /api/notion/sync/status Check sync status
POST /api/webhooks/notion Notion webhook endpointDigest Operations (Planned):
POST /api/digests/generate Generate digest report
GET /api/digests List digests
GET /api/digests/{id} Get digest
POST /api/digests/{id}/send Send digest to SlackDatabase Schema (Notion)
Knowledge Database (306e292a15d58004a8cbc222dcd48bb2):
- Title (title) - required
- Content (rich_text) - required, up to 2000 chars
- Status (status) - Draft / Review / Published / Archived
- Tags (multi_select)
- Categories (relation) - link to Categories DB
- Priority (select) - High / Medium / Low
- Target Groups (multi_select) - audience segments
- Source Type (select) - Slack Message / Manual / File / URL
- Source URL (url)
- Author (people)
- AI Summary (rich_text) - generated per target group
- Slack User ID, Message TS, Channel ID (rich_text)
- View Count (number)
- Last Reviewed (date)
- Attachments (files)
- Created, Last Edited (auto)
Categories Database (306e292a15d5805dae13e64bed8519c5):
- Name (title) - required
- Description (rich_text)
- Icon (rich_text) - emoji
- Default Target Groups (multi_select)
- Subscribers (people)
- Digest Frequency (select) - Daily / Weekly / Bi-weekly / Monthly
- Digest Day (select)
- Active (checkbox)
- Knowledge Count (rollup) - auto
- Last Digest (date)
- Created (auto)
Digests Database (306e292a15d580d7a0f6fe8421baff10):
- Title (title) - required
- Category (relation) - link to Categories DB
- Period Start, Period End (date)
- Items Count (number)
- Target Groups (multi_select)
- Generated By (people)
- Status (status) - Generating / Sent / Failed
- Slack Sent (checkbox)
- Recipients (rich_text) - Slack user IDs
- Generated At (auto)
Performance Targets
Current Performance (POC):
- Slack modal response time: ~200ms (target < 500ms)
- Notion API sync latency: ~800ms (target < 2s)
- Sync completion time: ~15s for 19 items (target < 30s)
MVP Performance Targets:
- End-to-end submission latency: < 3s (Slack submit Notion confirmation)
- Notion webhook processing: < 5s (Notion change Slack home updated)
- View refresh latency: < 2s (Sync triggered Modal updated)
- Concurrent view tracking: 100+ views
- Sync reliability: 99.9% (successful syncs / total attempts)
- API response time (p95): < 500ms
Security Requirements
- Slack signature verification (handled by Bolt framework)
- Notion API key stored in environment variables
- Internal API keys for service-to-service communication
- Least-privilege Slack scopes
- HTTPS for all external communication
- Input validation and sanitization
- Rate limiting on all endpoints
- SSRF protection for URL scraping
Cost Controls
- Rate limiting for LLM API calls
- Per-summary token tracking
- Notion API request monitoring
- Configurable summary length limits
- Caching to reduce API calls
7. Success Metrics
User Adoption Metrics
| Metric | Target | Measurement Method |
|---|---|---|
| Active users (weekly) | 20+ | Unique Slack user IDs submitting knowledge |
| Knowledge items submitted | 100+ | Count in Notion database |
| Active categories | 5+ | Categories with items added in last 30 days |
| Digest subscribers | 30+ | Users receiving digests |
| Digest open rate | 60%+ | Slack message read receipts |
Performance Metrics
| Metric | Target | Measurement Method |
|---|---|---|
| Submission latency | < 3s | Slack submit Notion confirmation |
| AI summary generation | < 30s | Async job completion time |
| Digest delivery success | 99%+ | Successful deliveries / Total attempts |
| API response time (p95) | < 500ms | Symfony monitoring |
| Sync reliability | 99.9%+ | Successful syncs / Total sync attempts |
Quality Metrics
| Metric | Target | Measurement Method |
|---|---|---|
| Error rate | < 1% | Failed requests / Total requests |
| User-reported bugs | < 5/week | Slack feedback channel |
| AI summary quality | 4+/5 | User ratings |
| Tag consistency | 80%+ | Normalized tags / Total tags |
Business Metrics
| Metric | Target | Measurement Method |
|---|---|---|
| Time saved per week | 2+ hours | User survey |
| Knowledge reuse rate | 30%+ | Items viewed > 1 time |
| Cross-team sharing | 50%+ | Items with multiple target groups |
| Digest engagement | 60%+ | Users clicking digest links |
8. Risks & Mitigations
Technical Risks
| Risk | Impact | Mitigation | Priority |
|---|---|---|---|
| Notion API downtime | High | SQLite backup, queue + retries | P0 |
| OpenAI API downtime | High | Allow ingestion without immediate summary | P0 |
| Slack rate limits | Medium | Request throttling, caching | P1 |
| Redis single point of failure | High | Document recovery procedures | P1 |
| Concurrent user race conditions | Medium | Load testing before Phase 3 | P1 |
Product Risks
| Risk | Impact | Mitigation | Priority |
|---|---|---|---|
| Low user adoption | High | User onboarding, training sessions | P0 |
| Poor AI summary quality | High | Editable templates, regenerate support | P0 |
| Scope creep (PDF/audio/video) | Medium | Strict phase gates, defer to post-MVP | P1 |
| Tag taxonomy inconsistency | Medium | Tag normalization, autocomplete | P1 |
| Dutch translation quality | Low | Native speaker review | P2 |
Cost Risks
| Risk | Impact | Mitigation | Priority |
|---|---|---|---|
| OpenAI API costs | High | Token logging, rate limits, length constraints | P0 |
| Notion API quota | Medium | Caching, batch operations | P1 |
| Infrastructure costs | Low | Start small, scale as needed | P2 |
9. Next Steps
Immediate Actions (Week 1)
Team Review of Sprint 0 Work
- Code review by senior developers
- Security review of API endpoints
- Dutch translation review
- UX review of Slack modals
Merge to Main Branch
- Create pull request with Sprint 0 work
- Address review feedback
- Merge to main branch
- Tag release as v0.1.0-poc
Create Remaining Extra Story Tickets
- Batch 2: Lists & Categories (35 tickets)
- Batch 4: Digest & Infrastructure (35 tickets)
- Batch 5: Search & User Management (35 tickets)
- Batch 6: Advanced & Analytics (30 tickets)
Short-Term Actions (Weeks 2-4)
Begin Sprint 1: Infrastructure Hardening
- Set up Redis for persistent state
- Implement queue system with Symfony Messenger
- Add retry logic with exponential backoff
- Implement rate limiting
- Add health check endpoints
Prepare for AI Integration
- Set up OpenAI account
- Design prompt templates for each target group
- Create AI service architecture
- Plan token usage tracking
Production Deployment Planning
- Select hosting provider
- Plan infrastructure setup
- Configure CI/CD pipeline
- Set up monitoring and alerting
Medium-Term Actions (Weeks 5-10)
Implement Core Features
- AI summary generation
- Digest scheduling and delivery
- Advanced search and filtering
Add Production Infrastructure
- Monitoring and logging
- Error tracking (Sentry)
- Performance optimization
- Load testing
User Testing
- Internal beta testing
- Gather feedback
- Iterate on UX
- Refine AI prompts
Appendix A: File References
Documentation Sources
/IMPLEMENTATION_GUIDE.md/project/PRD.md/project/PRD_UPDATED.md/project/backlog.md/project/roadmap.md/Architecture.md
POC Status Files
/home/ubuntu/yappa-knowledge-hub/var/backup-md/POC_COMPLETE.md/home/ubuntu/yappa-knowledge-hub/var/backup-md/POC_FINAL_STATUS.md/home/ubuntu/POC_COMPLETION_VERIFICATION.md
Jira Extra Story Files
/home/ubuntu/jira_extra story_tickets_summary.md/home/ubuntu/jira_extra story_batch1_part1.md/home/ubuntu/jira_extra story_batch1_part2.md/home/ubuntu/jira_extra story_batch3.md/home/ubuntu/EXTRA STORY_TICKETS.md
Feature Analysis Files
/home/ubuntu/FEATURE_COMPARISON_MATRIX.md/home/ubuntu/COMPLETE_PROJECT_STATUS.md
Appendix B: Configuration
Backend Environment (backend/.env)
# Notion API
NOTION_API_KEY=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
NOTION_VERSION=2022-06-28
# Database IDs
NOTION_DATABASE_KNOWLEDGE=306e292a15d58004a8cbc222dcd48bb2
NOTION_DATABASE_CATEGORIES=306e292a15d5805dae13e64bed8519c5
NOTION_DATABASE_DIGESTS=306e292a15d580d7a0f6fe8421baff10
# OpenAI (for MVP)
OPENAI_API_KEY=sk-...
# Redis (for MVP)
REDIS_DNS=redis://localhost:6379
REDIS_URL=redis://localhost:6379
# Notion Webhook (for MVP)
NOTION_WEBHOOK_SECRET=your_webhook_secret_here
NOTION_WEBHOOK_URL=https://your-domain.com/api/webhooks/notion
# Slack Bot Integration
SLACK_BOT_URL=http://localhost:3000Slack Bot Environment (.env)
SLACK_BOT_TOKEN=xoxb-...
SLACK_SIGNING_SECRET=...
SLACK_APP_TOKEN=xapp-...
API_BASE_URL=http://localhost:8000
PORT=3000
# Redis (for view tracking)
REDIS_URL=redis://localhost:6379Notion URLs
- Knowledge DB: https://notion.so/306e292a15d58004a8cbc222dcd48bb2
- Categories DB: https://notion.so/306e292a15d5805dae13e64bed8519c5
- Digests DB: https://notion.so/306e292a15d580d7a0f6fe8421baff10
Appendix C: Glossary
Thematic List (Category): A curated stream of resources designed for a specific purpose and audience context. Each list has a name, description, icon, default tags, and digest schedule.
Resource (Knowledge Item): A single knowledge item submitted by a user, including title, content, metadata, tags, and AI summaries.
Target Group: A role-based audience segment that shapes summarization (e.g., Developers, Marketers, CEO).
Digest: A periodic report generated per list and tailored per target group, delivered via Slack.
Sprint 0 (POC): Proof of Concept phase validating the technical approach with basic functionality.
MVP: Minimum Viable Product with AI summaries, digest automation, and production infrastructure.
Story Points (SP): Relative measure of effort required to complete a ticket (Fibonacci scale: 1, 2, 3, 5, 8, 13, 21).
Document Version: 4.0 (Consolidated) Last Updated: 2026-02-20 Next Review: After Sprint 1 completion