Data Pipeline & Loading System

Last Updated: 2026-05-26 • For Data Engineers & System Architects

Overview

This guide documents the data loading architecture, validation pipelines, caching strategies, and formatter utilities used across QuranBot.

  • Modular loader architecture
  • Validation & sanitization layers
  • Remote CDN + local fallback caching
  • Standardized formatter patterns

Data Pipeline Architecture

QuranBot uses a layered data loading system in src/data/ that supports remote CDN fetching, local caching, and graceful fallback to hardcoded defaults.

Pipeline Flow

  1. Config Layer (data-loader-config.js) - Defines API endpoints and language settings
  2. Fetch Layer (data-cache.js, data-azkar.js) - Fetches from remote CDN with timeout
  3. Loader Layer (data-loader-*.js) - Parses, validates, and formats data
  4. Global State - Assigns validated data to global.reciters, global.surahNames, etc.
// src/data/data-loader.js
async function loadAllData() {
    try {
        initializeGlobalLanguages();
        await loadSurahNames();
        await loadReciters();
        await loadQuranRadios();
        await loadAzkarData();
        await loadAzkarImages();
        global.surahNames = normalizeSurahCount(global.surahNames);
        logger.info(`All data loaded: ${global.surahNames?.length || 0} surahs, ${Object.keys(global.reciters || {}).length} reciters`);
        return true;
    } catch (error) {
        logger.error('Error loading all data', error);
        // Initialize all global data structures with safe defaults on failure
        global.surahNames = Array.from({ length: 114 }, (_, i) => `سورة ${i + 1}`);
        global.reciters = {};
        global.quranRadios = [];
        global.azkarData = [];
        return true;
    }
}

Loader Modules

Each data type has a dedicated loader that handles parsing, validation, and global state assignment:

Module Target Data Global Assignment
data-loader-surah.js Surah names & metadata global.surahNames
data-loader-reciters.js Reciter links & durations global.reciters
data-loader-radios.js Radio station URLs global.quranRadios
data-loader-azkar.js Azkar text & audio files global.azkarData
data-loader-azkar-images.js Azkar image URLs global.azkarImages

Caching Strategy

Data is cached in three layers to ensure high availability:

1. Remote CDN Cache (data-cache.js)

const remote_data_url = 'https://hub-mgv.github.io/QuranBotData/data_quran.json';
async function loadPersistedCache() {
    try {
        logger.info('Cache Loading data from remote endpoint');
        const response = await fetch(remote_data_url, {
            headers: getBrowserHeaders(),
            timeout: TimeoutRequest('default'),
        });
        if (!response.ok) throw new Error(`HTTP ${response.status}`);
        const remoteData = await response.json();
        const data = remoteData.cached_data || remoteData;
        if (!data || typeof data !== 'object') throw new Error('Invalid data structure from remote endpoint');
        return data;
    } catch (error) {
        logger.warn('Cache Failed to load from remote endpoint, using fallback data');
        return fallback_dataset;
    }
}

2. Local Fallback Cache

If remote fetch fails, bot uses fallback_dataset with 114 generated surah names and empty reciter arrays.

3. Runtime Memory Cache

Loaded data persists in global.* objects for the lifetime of the process.

Validation Layer

All data passes through strict validation before assignment to global state:

// src/data/data-loader-validator.js
function validateReciterData(reciter) {
    return reciter.rewaya_id && reciter.server;
}
function validateRadioData(radio) {
    return radio.url && radio.url.length > 0;
}
function validateAdhkarData(data) {
    return Array.isArray(data) && data.length > 0;
}

Formatter Utilities

Standardized formatters ensure consistent data structure across all sources:

Function Input Output
formatServerUrl Server base URL Ensures trailing /
formatSurahUrl Base URL + surah number https://server/001.mp3
formatDuration Surah index Estimated MM:SS
formatReciterName Raw reciter name Cleaned name (removes metadata)

Troubleshooting Data Issues

Reciters not loading

Check CDN accessibility:

  • Verify https://hub-mgv.github.io/QuranBotData/data_quran.json is reachable
  • Check data-loader-validator.js for schema mismatches
  • Ensure formatServerUrl outputs valid base URLs
Surah names showing as "سورة 1", "سورة 2"

Indicates fallback to fallback_dataset. Causes:

  • CDN returned HTTP error
  • JSON structure mismatch (missing surah.suwar)
  • Network timeout during fetch