BFRB Sensor Data Dashboard

This dashboard provides a comprehensive overview of sensor recordings from the Helios wrist device, including model performance and feature analysis. Current Status: Binary detection achieves strong performance (F1: 0.92), but gesture classification remains challenging (F1: 0.54) due to overlapping feature distributions and difficult class separation. The analysis reveals that gesture classes 2, 3, and 6 are the weakest performers, requiring targeted feature engineering and data augmentation. Use the filters to drill down to sequences or see dataset-wide averages.

Sequence Gesture Distribution

This plot shows the distribution of gesture classes across all recorded sequences. It helps identify class imbalance and highlights which gestures are most common in the dataset.

Total Sequences

—

Gesture Classes

—

Missing TOF Rows

—

Missing Thermopile Rows

—

Dataset Summary

View Mode: Filter by Gesture: Filter by Subject: Select Sequence:

Status: initializing…

Loading per-sequence data…

Visualizations

IMU plot will appear here when a sequence is selected.
This interactive plot displays the IMU (Inertial Measurement Unit) time-series data for the selected sequence, allowing you to explore movement patterns and sensor signals in detail.

TOF heatmap will appear here for dataset averages or selected sequences.
This heatmap visualizes Time-of-Flight (TOF) sensor data, showing spatial patterns and sensor coverage for either the dataset average or a specific sequence.

Model Performance

Latest model training and evaluation results are shown below. Key Findings: The binary detection model achieves strong accuracy (F1: 0.92), but gesture classification performance is suboptimal (F1: 0.54) due to significant feature overlap between gesture classes. The analysis shows that classes 2, 3, and 6 are the biggest challenges with F1 scores of 0.40, 0.45, and 0.45 respectively. This suggests the current feature set doesn't provide sufficient separation for these gesture types. Metrics and confusion matrices reflect subject-grouped cross-validation and the impact of recent feature engineering and stacking meta-model updates.

Binary F1 (Full Data)

0.92

Gesture F1 (Full Data)

0.54

Overall Score (Full Data)

0.73

Binary F1 (IMU Only)

0.91

Gesture F1 (IMU Only)

0.37

Overall Score (IMU Only)

0.64

Evaluation Visualizations

Gesture Confusion Matrix (Full Data)

Latest from today's run (September 5, 2025, 12:10):

[[455  22  29  17  25  24  38  28]
 [ 15 336  35  88  10  17  76  60]
 [ 31  48 243 114  77  84  15  26]
 [ 29 107 114 270  24  39  23  34]
 [ 21  15  76  14 365 121   9  19]
 [ 13  13  51  18  83 440   8  14]
 [ 64  73   7  20   7   7 282 180]
 [ 33  49   8  18   1   8 166 357]]

This confusion matrix visualizes the gesture classification model's predictions using all available sensors. It highlights which gesture classes are most often confused and where the model performs best or struggles.

Per-Class F1 Scores (Gesture Classification)

Detailed performance breakdown by gesture class from today's run:

Class 0: 0.70

Class 1: 0.52

Class 2: 0.40 (lowest)

Class 3: 0.45

Class 4: 0.59

Class 5: 0.64 (highest)

Class 6: 0.45

Class 7: 0.53

These scores show which gesture classes need the most improvement. Classes 2, 3, and 6 are the weakest performers.

Results Summary & Analysis

Current Performance: Using all available sensors (IMU, thermopile, TOF) yields strong binary classification (F1: 0.92), but gesture classification remains significantly below target (F1: 0.54 vs target ≥0.898). This results in an overall score of 0.73, which fails to meet competition requirements.

Root Causes of Low Gesture F1:

Feature Overlap: The hardest gesture classes (2, 3, 6) show substantial overlap in feature distributions, making them difficult to distinguish
Class Imbalance: Some gesture classes have fewer training examples, leading to poorer generalization
Feature Limitations: Current statistical features (mean, std, RMS, etc.) may not capture the temporal patterns unique to each gesture
Sensor Integration: While additional sensors help, their features may not be optimally combined

Critical Issues by Gesture Class:

Class 2: F1 = 0.40 (lowest performer) - needs significant feature engineering
Class 3: F1 = 0.45 - shows confusion with multiple other classes
Class 6: F1 = 0.45 - high misclassification rate
Classes 0, 4, 5: Best performers (F1: 0.70, 0.59, 0.64) - can serve as reference

Future Directions & Next Steps:

Advanced Feature Engineering: Develop temporal features, frequency domain analysis, and gesture-specific patterns
Data Augmentation: Generate synthetic examples for underrepresented classes (2, 3, 6)
Temporal Modeling: Implement RNNs, LSTMs, or attention mechanisms to capture sequence dynamics
Feature Selection: Identify and prioritize features that best separate the hard classes
Ensemble Optimization: Fine-tune stacking weights and explore alternative ensemble methods

IMU-only models perform well for binary detection but struggle with gesture recognition, highlighting the value of additional sensors but also the need for better multi-modal feature integration.

Binary Classification Details

Class 0 (Non-gesture): F1: 0.94, Precision: 0.94, Recall: 0.94

Class 1 (Gesture): F1: 0.92, Precision: 0.91, Recall: 0.92

Support: 3038 non-gestures, 5113 gestures. Accuracy: 0.93, Macro Avg: 0.93

Feature Analysis: Hardest Gesture Classes

Below are boxplots and histograms for selected features, comparing gesture classes 2, 3, 6, and 7. Each graph is explained to help you interpret which features best separate the hardest classes and guide further feature engineering.

Summary: Looking at these feature analysis plots, you’ll notice that the distributions for different gesture classes often overlap quite a bit. This means the current features aren’t doing a great job of helping the model tell the classes apart. In other words, the model struggles to distinguish between gestures because the features don’t provide enough separation. To improve classification, we’ll need to engineer new features or find better ways to represent the data so that the classes become more distinct in these plots.

acc_mag_mean

Boxplot: Shows the distribution of mean acceleration magnitude for each gesture class, highlighting differences and overlap between classes.

Histogram: Displays the frequency of mean acceleration magnitude values, helping to visualize class separation and feature usefulness.

acc_x_mean

Boxplot: Shows the distribution of mean acceleration in the X direction for each gesture class.

Histogram: Displays the frequency of mean acceleration X values, useful for spotting class overlap or separation.

acc_y_mean

Boxplot: Shows the distribution of mean acceleration in the Y direction for each gesture class.

Histogram: Displays the frequency of mean acceleration Y values, useful for identifying feature separation.

acc_z_mean

Boxplot: Shows the distribution of mean acceleration in the Z direction for each gesture class.

Histogram: Displays the frequency of mean acceleration Z values, useful for visualizing class differences and overlap.

For a full set of feature plots, see feature_analysis_plots/ in the project directory.

Next Steps & Future Directions

Based on the current analysis, here are the prioritized actions to improve gesture classification performance and achieve the target F1 score of ≥0.898:

🔧 Immediate Actions (High Priority)

Feature Engineering: Develop temporal features (autocorrelation, zero-crossing rates, peak analysis)
Frequency Analysis: Add spectral features to capture motion rhythms
Class 2, 3, 6 Focus: Targeted feature development for weakest classes
Data Augmentation: Generate synthetic examples for underrepresented gestures

🧠 Advanced Modeling (Medium Priority)

Temporal Models: Implement RNNs/LSTMs for sequence understanding
Attention Mechanisms: Focus on important time steps in gestures
Multi-modal Fusion: Better integration of IMU, TOF, and thermopile data
Ensemble Optimization: Fine-tune stacking weights and model combinations

📊 Analysis & Validation (Ongoing)

Feature Importance: Identify which features contribute most to classification
Error Analysis: Deep dive into misclassification patterns
Cross-validation: Ensure robust performance across subjects
Ablation Studies: Test impact of different feature sets

🎯 Success Metrics & Milestones

Phase 1 (2-4 weeks): Improve gesture F1 to 0.60+ through feature engineering
Phase 2 (4-6 weeks): Reach 0.70+ with temporal modeling
Phase 3 (6-8 weeks): Achieve 0.80+ with advanced techniques
Final Target: Overall F1 ≥0.898 for competition submission

Current baseline: Gesture F1 = 0.54, Overall = 0.73. Need ~66% improvement to reach target.

About this Web App

Overview

This dashboard visualizes processed recordings from the Helios wrist device. It provides a dataset summary, interactive IMU time-series plots, and TOF heatmaps for either the dataset average or individual sequences.

How to use

Use "View Mode" to switch between dataset-average and per-sequence views.
Apply the gesture or subject filters to narrow the selection; the sequence selector updates accordingly.
In per-sequence mode, pick a sequence to render IMU time-series and the sequence-averaged TOF heatmap.
If TOF appears empty, check console logs for detected TOF column names or run the preprocessing step to regenerate processed/ files.

Future directions

Persist user filter selections in the URL for shareable views.
Support multiple TOF sensors and per-cell value tooltips with absolute units.
Add export (PNG/CSV) and automated quality checks for missing sensor channels.
Improve performance for large datasets (virtualized lists, web workers).