Implementing Real-Time Anomaly Detection with OpenObserve and Random Cut Forest

Prabhat Sharma

June 01, 2025

7 min read

Don’t forget to share!

Ready to get started?

Try OpenObserve Cloud today for more efficient and performant observability.

Get Started For Free

Table of Contents

What is Anomaly Detection in Machine Learning?

Anomaly detection in machine learning identifies unusual patterns or outliers in data, making it essential for cybersecurity, finance, IT operations, and real-time monitoring. Also known as outlier detection, machine learning anomaly detection helps teams spot deviations in time series data over hours, days, or months, preventing costly incidents.

Key Use Cases for Anomaly Detection

Fraud detection in banking
System health and performance monitoring
Intrusion detection in cybersecurity
Predictive maintenance for equipment

Organizations leverage anomaly detection systems powered by advanced algorithms like Random Cut Forest or autoencoders to automate monitoring. These systems scale efficiently, reducing manual oversight and enabling real-time anomaly detection for proactive incident response.

What is OpenObserve for Real-Time Monitoring?

OpenObserve is an open-source observability platform designed for real-time monitoring and analysis of logs, metrics, and traces. It provides a unified interface for streamlined application debugging, root cause analysis, and real-time anomaly detection using OpenObserve. Its scalability makes it ideal for modern cloud and hybrid environments.

Why Choose OpenObserve for Anomaly Detection?

Native support for streaming data and real-time data analysis
Seamless integration with toolchains via flexible APIs
Cost-effective solution for large-scale log and metrics analysis

Teams often use anomaly detection integrated into OpenObserve to build proactive alerts or employ machine learning models for advanced anomaly detection algorithms.

How Time Series Anomaly Detection Works

Understanding Time Series Anomaly Detection

Time series anomaly detection analyzes logs, metrics, or streaming data to identify deviations from expected patterns. Common workflows include:

Monitoring log volumes, system metrics, or KPIs over time
Using machine learning anomaly detection to detect anomalies
Setting up threshold-based or model-based alerts for real-time anomaly detection

Algorithms like Random Cut Forest, Isolation Forest, and deep learning autoencoders excel in detecting anomalies in high-dimensional or complex time series data.

Q: What is time series anomaly detection?
A: It’s the process of using algorithms to detect unusual patterns in time-based data, ideal for monitoring and security applications.

Understanding Random Cut Forest for Anomaly Detection

What is Random Cut Forest (RCF)?

Random Cut Forest (RCF) is an unsupervised anomaly detection algorithm optimized for real-time anomaly detection in time series data and streaming environments. It requires no labeled data and handles high-dimensional datasets, making it perfect for anomaly detection for time series data.

How Random Cut Forest Works

Forest Construction: Builds multiple random decision trees (100–256 trees for optimal results)
Random Cuts: Partitions data via random feature splits
Isolation Measurement: Calculates how easily a point is isolated (fewer splits = higher anomaly score)
Anomaly Scoring: Assigns higher scores to easily isolated points

Advantages of Random Cut Forest:

Algorithm	Real-Time	No Labels Needed	High-Dimensional	Memory Efficient
Random Cut Forest	✅	✅	✅	✅
Isolation Forest	❌	✅	✅	❌
LSTM Autoencoder	❌	✅	✅	❌
Statistical Methods	✅	✅	❌	✅
SVM	❌	❌	✅	❌

Key Parameters for RCF

Number of Trees: 100 for balanced performance
Sample Size: 256 data points per tree
Shingle Size: Sliding windows for time series (e.g., hour, minute)
Anomaly Threshold: Default at 98th percentile for robust detection

RCF adapts to concept drift, making it ideal for Random Cut Forest for time series data where normal behavior evolves.

RCF Use Cases

Real-time log anomaly detection
Financial time series anomaly detection
Network and system performance monitoring
Outlier detection in IoT sensor networks

When to Use Random Cut Forest

Ideal for:

Streaming, real-time data
High-dimensional numeric features
Evolving anomaly patterns

Avoid for:

Categorical-only data (requires encoding)
Small datasets
Strongly seasonal data without preprocessing

Implementing Real-Time Anomaly Detection with OpenObserve

Follow these steps to implement a production-ready real-time anomaly detection system using OpenObserve and Random Cut Forest. Source code is available at GitHub: OpenObserve Anomaly Detector.

Prerequisites

OpenObserve 0.14.7+ with Actions enabled
Python 3.8+ and pip
24 hours of historical log data for training
Basic SQL and Python knowledge

Data Preparation: Extracting Features

Extract numeric features (hour, minute, log volume) for anomaly detection machine learning models:

SELECT 
    date_part('hour', to_timestamp(_timestamp / 1000000)) AS hour,
    date_part('minute', to_timestamp(_timestamp / 1000000)) AS minute,
    COUNT(*) AS y
FROM "default"
GROUP BY hour, minute
ORDER BY hour, minute
LIMIT 2000

Tip: Convert categorical features to numeric for compatibility with Random Cut Forest for time series data.

Step 1: Setting Up OpenObserve Service Account

Create a service account (e.g., anomaly_detector_serviceaccount@openobserve.ai)
Assign a role (anomaly_detector) with Stream, Action Scripts, and Alert Folder permissions
Attach the service account to the role

Step 2: Configuration and Environment Setup

Configure credentials in a .env file for secure real-time anomaly detection using OpenObserve:

ORIGIN_CLUSTER_URL=https://your-cluster.openobserve.ai
ORIGIN_CLUSTER_TOKEN=your_base64_encoded_service_account_token
OPENOBSERVE_ORG=your_organization_name

Best Practice: Add .env to .gitignore and rotate tokens regularly for security.

Step 3: Training the Anomaly Detection Model

Train a robust machine learning anomaly detection model with these steps:

Extract 24+ hours of high-quality, gap-free data
Normalize features and create sliding windows
Fit the Random Cut Forest model
Set anomaly threshold (98th percentile)
Validate and package the model

Threshold Tip: Start at 98th percentile to balance sensitivity and reduce false positives.

Step 4: Deployment and Real-Time Inference

Deploy the model using deploy.py and pack.sh for scheduled actions in OpenObserve. Monitor streaming data every minute, flagging anomalies in a dedicated log stream.

Step 5: Results, Visualization, and Alerts

Post-deployment, visualize anomalies in time series data:

Values exceeding the threshold (~230,000 events/minute) are flagged (is_anomaly=true) and trigger real-time anomaly detection alerts.

Troubleshooting Anomaly Detection Systems

Address common anomaly detection challenges in time series:

All points flagged as anomalies: Ensure sufficient normal data in training.
High false positives: Adjust threshold to 99th percentile.
Deployment failures: Verify service account permissions and API endpoints.

Tip: Regularly retrain models to adapt to concept drift for cost-effective anomaly detection solutions.

Conclusion

Implementing real-time anomaly detection with OpenObserve and Random Cut Forest offers scalable, intelligent monitoring for time series data. This approach ensures flexibility, actionable alerts, and minimal manual tuning. Start with our Random Cut Forest implementation guide, monitor performance, and refine thresholds for optimal results in your observability stack.

FAQs about Real-Time Anomaly Detection Systems

Q1: What is real-time anomaly detection in machine learning?
A: It’s the use of algorithms like Random Cut Forest to detect outliers in streaming or time series data instantly, ideal for monitoring and security.

Q2: How does time series anomaly detection work?
A: It models normal trends in time-based data, flagging deviations using statistical or machine learning methods like Random Cut Forest for time series data.

Q3: Can I implement anomaly detection in Python with OpenObserve?
A: Yes, Python libraries (e.g., scikit-learn, PyOD) integrate with OpenObserve for custom real-time anomaly detection using OpenObserve.

Q4: Is Random Cut Forest better than statistical methods?
A: RCF excels in real-time anomaly detection for high-dimensional, streaming data, offering greater flexibility than statistical methods.

Q5: What are common challenges in anomaly detection systems?
A: Issues like false positives or concept drift require careful threshold tuning and regular retraining for troubleshooting anomaly detection systems.

About the Author

Prabhat Sharma

Prabhat Sharma is the founder of OpenObserve, bringing extensive expertise in cloud computing, Kubernetes, and observability. His interests also encompass machine learning, liberal arts, economics, and systems architecture. Outside of work, Prabhat enjoys spending quality time playing with his children.

Latest From Our Blogs

View all posts

DataDog vs OpenObserve Part 4: Dashboards - Prebuilt, Drag & Drop, Custom Visualizations, Up to 98% Cost Savings

Engineering

ComparisonsObservability

DataDog vs OpenObserve Part 4: Dashboards - Prebuilt, Drag & Drop, Custom Visualizations, Up to 98% Cost Savings

Hands-on comparison of DataDog and OpenObserve for dashboard visualizations. DataDog's proprietary query syntax and premium tiers vs OpenObserve's SQL/PromQL queries, ECharts custom charts, and community dashboard library. Real test data shows 98% cost savings ($174/day vs $3/day) with identical dashboard capabilities and OpenTelemetry-native architecture.

Manas Sharma

2026-01-09

Announcement

Major Product Update! OpenObserve v0.40.0

OpenObserve v0.40.0 comes just in time for the new year, a milestone release that brings major architectural improvements, powerful new alerting capabilities, enhanced observability features, and significant UI/UX refinements. This release represents months of work focused on making OpenObserve more powerful, reliable, and user-friendly.

Simran Kumari,Jake Swiss

2026-01-06

Introducing the OpenObserve Kubernetes Operator: Observability as Code

Engineering

EnterpriseOpenObserveObservability

Introducing the OpenObserve Kubernetes Operator: Observability as Code

OpenObserve Kubernetes Operator brings observability as code to platform teams. Manage alerts, pipelines, and functions as Kubernetes resources with GitOps workflows.

Md Mosaraf,Manas Sharma

2026-01-06

Monitoring Caddy, MinIO, NATS, and ScyllaDB with OpenObserve Dashboards

How to

Metrics

Monitoring Caddy, MinIO, NATS, and ScyllaDB with OpenObserve Dashboards

A walkthrough of dashboard JSON structure, query patterns, and integration architecture for Caddy, MinIO, NATS, and ScyllaDB.

Anurag Vishwakarma

2026-01-05

Introducing Log Patterns in OpenObserve: Automatic Pattern Extraction for Faster Log Analysis

Engineering

EnterpriseLoggingOpenObserve

Introducing Log Patterns in OpenObserve: Automatic Pattern Extraction for Faster Log Analysis

Automatically extract patterns from millions of logs in seconds. Learn how OpenObserve's log pattern analysis helps SREs reduce incident investigation time from 30 minutes to under 5 minutes.

Ashish Kolhe,Manas Sharma

2026-01-05

Engineering

Kubernetes

Top 10 Kubernetes Monitoring Tools in 2025: Complete Guide

A comprehensive comparison of the top 10 Kubernetes Monitoring tools in 2025 highlighting their strengths, trade-offs, and use-cases.

Simran Kumari

2025-12-29

Engineering

Top 10 Datadog Alternatives in 2025: What to Choose

Explore the top Datadog alternatives in 2025, including open source and SaaS observability platforms for logs, metrics, traces, APM, and OpenTelemetry. Compare features, pricing, and use cases to choose the right monitoring solution for your team.

Simran Kumari

2025-12-24

DataDog vs OpenObserve Part 3: Traces & APM Comparison

Engineering

ComparisonsOpenObserveOpentelemetry

DataDog vs OpenObserve Part 3: Traces & APM Comparison

DataDog vs OpenObserve APM comparison: $120/day LLM charge, SQL trace dashboards, OTel native, service dependency mapping, and 60-90% cost savings with real data.

Top 10 Log Monitoring Tools in 2025: Complete Guide

A comprehensive comparison of the top 10 log monitoring tools in 2025 highlighting their strengths, trade-offs, and use-cases.

Simran Kumari

2025-12-22

DataDog vs OpenObserve Part 2: Metrics Comparison

Engineering

ComparisonsOpenObserveOpentelemetry

DataDog vs OpenObserve Part 2: Metrics Comparison

DataDog vs OpenObserve metrics comparison: PromQL support, high-cardinality handling, custom metrics auto-generation, and 60-90% cost savings with real data.

Manas Sharma

2025-12-22