Impact Analysis System - Technical Documentation

System Architecture

High-Level Architecture

Presentation Layer: Django Templates, AJAX APIs, Static Assets
Business Logic Layer: Data Processor, Statistical Engine, Qualitative Analyzer
Data Access Layer: Django ORM, File Storage, Cache Layer
Infrastructure Layer: PostgreSQL, File System, Redis Cache

Data Flow Pipeline

Data Sources → Processing Pipeline → Quality Assessment → Unified Data Model
↓
Method Execution Engine → Results Generation → AI Interpretation → Export Generation

Mathematical Formulas

Independent T-Test

Formula: t = (x̄₁ - x̄₂) / SE

Where:
SE = sp × √(1/n₁ + 1/n₂)
sp = √[((n₁-1)s₁² + (n₂-1)s₂²) / (n₁+n₂-2)]

Variables:
• x̄₁, x̄₂ = sample means
• s₁, s₂ = sample standard deviations
• n₁, n₂ = sample sizes
• sp = pooled standard deviation
• SE = standard error
• df = n₁ + n₂ - 2 (degrees of freedom)

Difference-in-Differences (DiD)

Basic Formula: DiD = (Ȳ₁₁ - Ȳ₁₀) - (Ȳ₀₁ - Ȳ₀₀)

Regression Specification:
Yᵢₜ = β₀ + β₁Treatmentᵢ + β₂Postₜ + β₃(Treatmentᵢ × Postₜ) + εᵢₜ

Where: β₃ = DiD estimator (treatment effect)

Propensity Score Matching (PSM)

Propensity Score: e(X) = P(T=1|X) = exp(β'X) / (1 + exp(β'X))

Average Treatment Effect: ATT = E[Y₁ - Y₀ | T=1]

Standardized Bias:
SB = (x̄ₜᵣₑₐₜₘₑₙₜ - x̄ₒₙₜᵣₒₗ) / √[(s²ₜᵣₑₐₜₘₑₙₜ + s²ₒₙₜᵣₒₗ)/2] × 100

Balance Criterion: Acceptable if |SB| < 10%

One-Way ANOVA

F-Statistic: F = MSB / MSW

Where:
MSB = SSB / (k-1) [Mean Square Between groups]
MSW = SSW / (N-k) [Mean Square Within groups]
SSB = Σnᵢ(x̄ᵢ - x̄)² [Sum of Squares Between]
SSW = ΣΣ(xᵢⱼ - x̄ᵢ)² [Sum of Squares Within]

Effect Size: η² = SSB / SST

TF-IDF (Text Analysis)

Term Frequency: tf(t,d) = f(t,d) / Σf(w,d)

Inverse Document Frequency: idf(t,D) = log(|D| / |{d ∈ D : t ∈ d}|)

TF-IDF Score: tfidf(t,d,D) = tf(t,d) × idf(t,D)

Data Quality Score

Overall Formula:
Quality_Score = (Completeness × 0.4) + (Consistency × 0.3) + (Validity × 0.3)

Where:
Completeness = (Total_Cells - Missing_Cells) / Total_Cells × 100%
Consistency = (1 - duplicate_rate) × 100
Validity = (1 - invalid_values_rate) × 100

Output Specifications

Statistical Result Record Structure

Component	Field	Type	Description
Identification	analysis_job_id	UUID	Unique analysis identifier
	method	Enumerated	Statistical method used
	outcome_variable	String	Variable being analyzed
	created_timestamp	DateTime	When result was created
Core Statistics	treatment_effect	Decimal(15,6)	Estimated treatment effect
	standard_error	Decimal(15,6)	Standard error of estimate
	p_value	Decimal(15,10)	Statistical significance
	confidence_interval_lower	Decimal(15,6)	Lower CI bound
	confidence_interval_upper	Decimal(15,6)	Upper CI bound
	confidence_level	Decimal(5,2)	CI level (default 95.00)
Sample Info	treatment_group_size	Integer	Treatment group N
	control_group_size	Integer	Control group N
	total_sample_size	Integer	Total sample N
	effective_sample_size	Integer	Effective N after matching

Significance Classification

Level	P-Value Range	Description
highly_significant	p < 0.01	Strong evidence against null
significant	p < 0.05	Conventional significance
marginally_significant	p < 0.10	Weak evidence
not_significant	p ≥ 0.10	No evidence against null

Qualitative Result Structure

Theme Information:
• Theme Name: String (255)
• Theme Description: Text
• Theme Keywords: JSON Array
• Theme Frequency: Integer
• Theme Percentage: Decimal (5,2)

Sentiment Analysis:
• Sentiment Label: very_positive, positive, neutral, negative, very_negative
• Sentiment Score: Decimal (5,3) [-1.000 to 1.000]
• Confidence Score: Decimal (5,2)

Supporting Evidence:
• Sample Quotes: JSON Array (max 5)
• Representative Examples: JSON Array
• Context Information: JSON Object

Export Formats

Power BI Export Package

Data Tables (Excel Workbook):
• Analysis_Summary: Metadata, project info, summary statistics
• Statistical_Results: Method info, effect sizes, significance tests
• Participant_Data: Demographics, baseline/midline/endline values
• Qualitative_Results: Themes, sentiment analysis, word frequencies
• Data_Dictionary: Table descriptions, field definitions

Metadata (JSON):
• Dataset Information: Name, description, version, contact
• Table Relationships: Primary keys, foreign keys, cardinality
• Suggested Measures: Treatment effects, significance rates
• Recommended Visualizations: Bar charts, scatter plots, tables

PDF Report Structure

Report Type	Pages	Target Audience	Key Sections
Executive	10-15	Decision-makers	Summary, key findings, recommendations
Technical	20-30	Researchers	Detailed methodology, comprehensive results
Comprehensive	40+	All stakeholders	Complete analysis, full documentation

Excel Export Structure

Workbook Sheets:
• Summary: Analysis overview, key statistics
• Participant_Data: Individual participant records
• Statistical_Results: Method results with full details
• Qualitative_Results: Themes, sentiment, word frequencies
• Treatment_Assignments: Group assignments and propensity scores
• Data_Dictionary: Variable definitions and descriptions
• Metadata: Analysis information, processing details

Comprehensive Export Package (ZIP)

Archive Contents:
📁 powerbi/ - Dataset files and metadata
📁 reports/ - PDF reports (executive, technical, comprehensive)
📁 data/ - Raw analysis data in multiple formats
📁 qualitative/ - Text analysis exports for external tools
📁 visualizations/ - Charts and dashboard previews
📁 documentation/ - User guides and methodology notes
📁 metadata/ - Export summary and configuration files

Component Architecture

Data Processing Components

CSV Upload Processor: Multi-format support, encoding detection, validation, cleaning
Form Response Processor: Project data extraction, form mapping, response linking
Data Validator: Quality assessment, completeness checking, readiness validation

Statistical Engine Components

Causal Inference Methods: DiD, PSM, IV, RDD
Standard Methods: T-Tests, ANOVA, Regression, Descriptive Statistics
Specialized Methods: Survival Analysis, Panel Data, Heckman Selection

Qualitative Analyzer Components

Text Processing: Preprocessing, tokenization, lemmatization
Content Analysis: Theme extraction, sentiment analysis, text coding
Pattern Recognition: Word frequency, n-gram analysis, entity recognition

AI Service Components

API Client: Authentication, rate limiting, error handling
Prompt Management: Template management, context optimization
Response Processing: Content extraction, quality assessment, caching

Processing Pipeline

Job Lifecycle States

        PENDING → IMPORTING → VALIDATING → CONFIGURING → RUNNING → COMPLETED
                                                                      ↓
                                                                   FAILED
    

Real-time Progress Tracking

Metrics: Current method, completion percentage, time elapsed, ETA, error count
Channels: WebSocket, AJAX polling, server events, email notifications
Persistence: Database storage, progress logs, checkpoints, recovery points

Error Handling Strategy

Detection: Data errors, processing errors, system errors
Response: Recovery actions, user notification, system response
Outcomes: Graceful degradation, partial results, safe failure

Security Framework

Authentication: Multi-factor authentication, single sign-on, role-based access
Authorization: Organization-based isolation, project-level permissions
Data Protection: Encryption at rest and in transit, PII handling
API Security: Rate limiting, input validation, injection prevention
Compliance: GDPR compliance, data governance, audit logging