Data Pipeline & Loading System
Overview
This guide documents the data loading architecture, validation pipelines, caching strategies, and formatter utilities used across QuranBot.
- Modular loader architecture
- Validation & sanitization layers
- Remote CDN + local fallback caching
- Standardized formatter patterns
Data Pipeline Architecture
QuranBot uses a layered data loading system in src/data/ that supports remote CDN fetching,
local caching, and graceful fallback to hardcoded defaults.
Pipeline Flow
-
Config Layer (
data-loader-config.js) - Defines API endpoints and language settings -
Fetch Layer (
data-cache.js,data-azkar.js) - Fetches from remote CDN with timeout -
Loader Layer (
data-loader-*.js) - Parses, validates, and formats data -
Global State - Assigns validated data to
global.reciters,global.surahNames, etc.
// src/data/data-loader.js
async function loadAllData() {
try {
initializeGlobalLanguages();
await loadSurahNames();
await loadReciters();
await loadQuranRadios();
await loadAzkarData();
await loadAzkarImages();
global.surahNames = normalizeSurahCount(global.surahNames);
logger.info(`All data loaded: ${global.surahNames?.length || 0} surahs, ${Object.keys(global.reciters || {}).length} reciters`);
return true;
} catch (error) {
logger.error('Error loading all data', error);
// Initialize all global data structures with safe defaults on failure
global.surahNames = Array.from({ length: 114 }, (_, i) => `سورة ${i + 1}`);
global.reciters = {};
global.quranRadios = [];
global.azkarData = [];
return true;
}
}
Loader Modules
Each data type has a dedicated loader that handles parsing, validation, and global state assignment:
| Module | Target Data | Global Assignment |
|---|---|---|
data-loader-surah.js |
Surah names & metadata | global.surahNames |
data-loader-reciters.js |
Reciter links & durations | global.reciters |
data-loader-radios.js |
Radio station URLs | global.quranRadios |
data-loader-azkar.js |
Azkar text & audio files | global.azkarData |
data-loader-azkar-images.js |
Azkar image URLs | global.azkarImages |
Caching Strategy
Data is cached in three layers to ensure high availability:
1. Remote CDN Cache (data-cache.js)
const remote_data_url = 'https://hub-mgv.github.io/QuranBotData/data_quran.json';
async function loadPersistedCache() {
try {
logger.info('Cache Loading data from remote endpoint');
const response = await fetch(remote_data_url, {
headers: getBrowserHeaders(),
timeout: TimeoutRequest('default'),
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const remoteData = await response.json();
const data = remoteData.cached_data || remoteData;
if (!data || typeof data !== 'object') throw new Error('Invalid data structure from remote endpoint');
return data;
} catch (error) {
logger.warn('Cache Failed to load from remote endpoint, using fallback data');
return fallback_dataset;
}
}
2. Local Fallback Cache
If remote fetch fails, bot uses fallback_dataset with 114 generated surah names and empty
reciter arrays.
3. Runtime Memory Cache
Loaded data persists in global.* objects for the lifetime of the process.
Validation Layer
All data passes through strict validation before assignment to global state:
// src/data/data-loader-validator.js
function validateReciterData(reciter) {
return reciter.rewaya_id && reciter.server;
}
function validateRadioData(radio) {
return radio.url && radio.url.length > 0;
}
function validateAdhkarData(data) {
return Array.isArray(data) && data.length > 0;
}
Formatter Utilities
Standardized formatters ensure consistent data structure across all sources:
| Function | Input | Output |
|---|---|---|
formatServerUrl |
Server base URL | Ensures trailing / |
formatSurahUrl |
Base URL + surah number | https://server/001.mp3 |
formatDuration |
Surah index | Estimated MM:SS |
formatReciterName |
Raw reciter name | Cleaned name (removes metadata) |
Troubleshooting Data Issues
Reciters not loading
Check CDN accessibility:
- Verify
https://hub-mgv.github.io/QuranBotData/data_quran.jsonis reachable - Check
data-loader-validator.jsfor schema mismatches - Ensure
formatServerUrloutputs valid base URLs
Surah names showing as "سورة 1", "سورة 2"
Indicates fallback to fallback_dataset. Causes:
- CDN returned HTTP error
- JSON structure mismatch (missing
surah.suwar) - Network timeout during fetch