Configuration#
The MultiplEYE preprocessing pipeline uses a central configuration system to manage all parameters, ensuring reproducible and consistent data processing.
Loading Precedence#
The pipeline searches for configuration in the following order:
CLI Argument:
--config_path your_config.yamlwhen running the preprocessing script.Environment Variable:
MULTIPLEYE_CONFIGpointing to a YAML file.Local Default:
multipleye_settings_preprocessing.yamlin your current working directory.
If no configuration is found, the pipeline will:
Copy a template to
multipleye_settings_preprocessing.yamlin your current directory.Display a message with instructions.
Stop execution.
You must then edit the file (at least set data_collection_name) and rerun the command.
Initial Setup#
When you run the pipeline for the first time in a new directory, it will create a template for you.
uv run run_preprocessing
After it stops, open multipleye_settings_preprocessing.yaml and configure your session.
Configuration Settings#
Settings are divided into user-configurable parameters and internal constants.
User Settings (Required & Common)#
DATA_COLLECTION_NAME: (Required) A unique identifier for your collection.Format:
MultiplEYE_[LANG]_[COUNTRY]_[CITY]_[LAB_NO]_[YEAR]Example:
MultiplEYE_EN_UK_London_1_2026Note: This name has been given to you by the MultiplEYE project. It is used to determine data and output paths. If it doesn’t match the required 6-part format, the pipeline might fail to resolve certain paths.
OVERWRITE:trueto reprocess existing data,false(default) to skip already processed sessions.EXPERIMENT_TYPE:MultiplEYE(default) orMeRID.INCLUDE_SESSIONS/EXCLUDE_SESSIONS: Optional lists to filter which sessions are processed.INCLUDE_PILOTS:trueto include data from pilot folders (default:false).EXPECTED_SAMPLING_RATE_HZ: The sampling rate of your eye tracker (default:1000).
Less commonly changed user settings#
DATASET_DIR: The path where your raw data is located. By default, this isdata/[DATA_COLLECTION_NAME].OUTPUT_DIR: The path where preprocessed data will be saved. By default, this ispreprocessed_data/[DATA_COLLECTION_NAME].
Quality Check Thresholds#
These settings define the criteria for “GOOD” data quality. Do not change these unless you are a core developer, as they ensure consistency across the MultiplEYE project.
ACCEPTABLE_NUM_CALIBRATIONS: [min, max] range for calibrations.ACCEPTABLE_NUM_VALIDATION: [min, max] range for validations.ACCEPTABLE_AVG_VALIDATION_SCORES: Acceptable average accuracy.ACCEPTABLE_DATA_LOSS_RATIOS: Max allowed data loss.ACCEPTABLE_RECORDING_DURATIONS: Acceptable session duration range.ACCEPTABLE_NUM_PRACTICE_TRIALS: Expected number of practice trials.ACCEPTABLE_NUM_TRIALS: Expected minimum number of trials.
Logging Settings#
LOG_LEVEL: General log level (default:INFO).CONSOLE_LOG_LEVEL: What you see in the terminal.FILE_LOG_LEVEL: What is saved topreprocessing_logs.txt(usuallyDEBUG).
Programmatic Usage (Notebooks)#
If you are using the package in a Python script or Jupyter notebook:
from preprocessing import settings
# Load a specific config file
settings.load("path/to/your_config.yaml")
# Access settings
print(settings.DATA_COLLECTION_NAME)
Internal Constants#
The settings object also contains technical parameters like folder names (RAW_DATA_FOLDER, etc.)
and regex patterns. These are marked as internal in the template and should not be modified.