Research
Open datasets for
space research
Derived datasets for ML research in orbital mechanics and space domain awareness. Built on public sources, processed by our algorithms.
Our approach
We build datasets by processing publicly available data through our algorithms and physics models.
Public inputs
TLEs from Space-Track, space weather from NOAA, ground station locations from public databases. Transparent provenance.
Our processing
Conjunction screening, maneuver detection, orbital mechanics computation, physics-based simulation. The value we add.
Research-ready outputs
Clean, documented datasets with train/val/test splits. Ready for ML research and benchmarking.
Available Q2 2026
We're preparing these datasets for public release. Contact us for early access or research collaboration.
Orbital Intelligence Datasets
Derived from public Two-Line Element data via our orbital analysis algorithms.
Conjunction Events Dataset
Close approach events computed from public TLE catalog. For each event: time of closest approach, miss distance, relative velocity, and collision probability estimate. Enables research on conjunction screening and collision avoidance.
Processing: Brute-force screening of catalog pairs, filtered by orbital geometry, refined with numerical propagation.
Maneuver Detection Dataset
Satellite maneuvers detected from TLE sequence discontinuities. Includes maneuver timing, estimated delta-v, and classification (station-keeping, orbit raise, plane change, collision avoidance). Ground truth derived from public conjunction warnings and known events.
Processing: Sequential TLE analysis detecting orbital element discontinuities beyond propagation error thresholds.
Orbit Prediction Dataset
Historical TLE sequences with train/test splits for orbit prediction benchmarking. Given past TLEs, predict future orbital elements. Includes SGP4 baseline predictions for comparison.
Processing: Temporal splits ensuring no leakage, stratified by orbital regime (LEO/MEO/GEO) and object type.
Satellite Classification Dataset
Orbital behavior sequences labeled with satellite type and operational status. Labels derived from UCS Satellite Database and public catalogs. For classification research using only orbital elements as input.
Processing: TLE sequences joined with public satellite metadata, cleaned and standardized labels.
Computed Environment Datasets
Generated from orbital mechanics calculations and physics models. No hardware telemetry required.
Eclipse Timing Dataset
Precise eclipse entry/exit times computed for satellites across orbital regimes. Includes umbra and penumbra durations, sun angle progressions, and beta angle variations. Foundation for power and thermal modeling.
Processing: Shadow geometry computation using conical Earth shadow model and high-precision sun ephemeris.
Ground Station Visibility Dataset
Computed visibility windows between satellites and ground stations. Includes elevation profiles, azimuth tracks, and theoretical link margins. Based on public ground station network locations.
Processing: Geometric visibility computation with terrain masking and minimum elevation constraints.
Radiation Environment Dataset
Modeled radiation exposure along orbital trajectories. Computed using AP-8/AE-8 trapped particle models and solar proton event data from NOAA. Includes South Atlantic Anomaly transit times and dose rate estimates.
Processing: Orbital position mapped to radiation belt models, correlated with historical space weather indices.
Simulation & Experiment Datasets
Generated from our own simulations and ML experiments. Synthetic but realistic.
Federated Learning Experiment Logs
Training logs from federated learning experiments with simulated Earth-space network constraints. Includes gradient statistics, convergence curves, communication costs, and accuracy by synchronization strategy.
Processing: Actual FL training runs with injected latency, bandwidth limits, and intermittent connectivity matching orbital link profiles.
Model Partitioning Results
Benchmark results for neural network partitioning across distributed infrastructure. Various model architectures tested with different latency/bandwidth constraint profiles. Optimal split points and performance trade-offs.
Processing: Exhaustive evaluation of partition points for standard architectures (ResNet, BERT, etc.) under varied constraints.
Gradient Compression Benchmarks
Evaluation of gradient compression techniques for bandwidth-limited distributed training. Compression ratios, reconstruction error, and downstream model accuracy across compression methods and rates.
Processing: Systematic evaluation of quantization, sparsification, and learned compression on standard training tasks.
Data formats
All datasets available in multiple formats for different workflows.
Apache Parquet
Columnar format optimized for analytical queries. Best for large-scale processing with Spark, DuckDB, or pandas. Includes schema and compression.
CSV
Universal format for maximum compatibility. Works with any tool or language. Includes header row with column names.
Data licensing
| License | Use Case | Cost |
|---|---|---|
| CC BY 4.0 | Academic and commercial use with attribution | Free |
| Enterprise License | Custom processing, private datasets, SLA | Contact us |
Note: Our datasets are derived from public sources. Original data from Space-Track requires a user agreement. NOAA data is public domain.
Want early access?
Contact us if you're working on space research and need access to our datasets before public release.