Table of Contents

anomaly_detection_dc8932e31c.png

What is Anomaly Detection in Machine Learning?

Anomaly detection in machine learning identifies unusual patterns or outliers in data, making it essential for cybersecurity, finance, IT operations, and real-time monitoring. Also known as outlier detection, machine learning anomaly detection helps teams spot deviations in time series data over hours, days, or months, preventing costly incidents.

Key Use Cases for Anomaly Detection

  • Fraud detection in banking
  • System health and performance monitoring
  • Intrusion detection in cybersecurity
  • Predictive maintenance for equipment

Organizations leverage anomaly detection systems powered by advanced algorithms like Random Cut Forest or autoencoders to automate monitoring. These systems scale efficiently, reducing manual oversight and enabling real-time anomaly detection for proactive incident response.


What is OpenObserve for Real-Time Monitoring?

OpenObserve is an open-source observability platform designed for real-time monitoring and analysis of logs, metrics, and traces. It provides a unified interface for streamlined application debugging, root cause analysis, and real-time anomaly detection using OpenObserve. Its scalability makes it ideal for modern cloud and hybrid environments.

Why Choose OpenObserve for Anomaly Detection?

  • Native support for streaming data and real-time data analysis
  • Seamless integration with toolchains via flexible APIs
  • Cost-effective solution for large-scale log and metrics analysis

Teams often use anomaly detection integrated into OpenObserve to build proactive alerts or employ machine learning models for advanced anomaly detection algorithms.


How Time Series Anomaly Detection Works

Understanding Time Series Anomaly Detection

Time series anomaly detection analyzes logs, metrics, or streaming data to identify deviations from expected patterns. Common workflows include:

  • Monitoring log volumes, system metrics, or KPIs over time
  • Using machine learning anomaly detection to detect anomalies
  • Setting up threshold-based or model-based alerts for real-time anomaly detection

Algorithms like Random Cut Forest, Isolation Forest, and deep learning autoencoders excel in detecting anomalies in high-dimensional or complex time series data.

Q: What is time series anomaly detection?
A: It’s the process of using algorithms to detect unusual patterns in time-based data, ideal for monitoring and security applications.


Understanding Random Cut Forest for Anomaly Detection

What is Random Cut Forest (RCF)?

Random Cut Forest (RCF) is an unsupervised anomaly detection algorithm optimized for real-time anomaly detection in time series data and streaming environments. It requires no labeled data and handles high-dimensional datasets, making it perfect for anomaly detection for time series data.

How Random Cut Forest Works

  • Forest Construction: Builds multiple random decision trees (100–256 trees for optimal results)
  • Random Cuts: Partitions data via random feature splits
  • Isolation Measurement: Calculates how easily a point is isolated (fewer splits = higher anomaly score)
  • Anomaly Scoring: Assigns higher scores to easily isolated points

Advantages of Random Cut Forest:

Algorithm Real-Time No Labels Needed High-Dimensional Memory Efficient
Random Cut Forest
Isolation Forest
LSTM Autoencoder
Statistical Methods
SVM

Key Parameters for RCF

  • Number of Trees: 100 for balanced performance
  • Sample Size: 256 data points per tree
  • Shingle Size: Sliding windows for time series (e.g., hour, minute)
  • Anomaly Threshold: Default at 98th percentile for robust detection

RCF adapts to concept drift, making it ideal for Random Cut Forest for time series data where normal behavior evolves.

RCF Use Cases

  • Real-time log anomaly detection
  • Financial time series anomaly detection
  • Network and system performance monitoring
  • Outlier detection in IoT sensor networks

When to Use Random Cut Forest

Ideal for:

  • Streaming, real-time data
  • High-dimensional numeric features
  • Evolving anomaly patterns

Avoid for:

  • Categorical-only data (requires encoding)
  • Small datasets
  • Strongly seasonal data without preprocessing

Implementing Real-Time Anomaly Detection with OpenObserve

Follow these steps to implement a production-ready real-time anomaly detection system using OpenObserve and Random Cut Forest. Source code is available at GitHub: OpenObserve Anomaly Detector.

Prerequisites

  • OpenObserve 0.14.7+ with Actions enabled
  • Python 3.8+ and pip
  • 24 hours of historical log data for training
  • Basic SQL and Python knowledge

Data Preparation: Extracting Features

Extract numeric features (hour, minute, log volume) for anomaly detection machine learning models:

SELECT 
    date_part('hour', to_timestamp(_timestamp / 1000000)) AS hour,
    date_part('minute', to_timestamp(_timestamp / 1000000)) AS minute,
    COUNT(*) AS y
FROM "default"
GROUP BY hour, minute
ORDER BY hour, minute
LIMIT 2000

Tip: Convert categorical features to numeric for compatibility with Random Cut Forest for time series data.


Step 1: Setting Up OpenObserve Service Account

  1. Create a service account (e.g., anomaly_detector_serviceaccount@openobserve.ai)
  2. Assign a role (anomaly_detector) with Stream, Action Scripts, and Alert Folder permissions
  3. Attach the service account to the role

anomaly_detection_dc8932e31c.png ! anomaly_detector_role_8a8727ed3c.png


Step 2: Configuration and Environment Setup

Configure credentials in a .env file for secure real-time anomaly detection using OpenObserve:

ORIGIN_CLUSTER_URL=https://your-cluster.openobserve.ai
ORIGIN_CLUSTER_TOKEN=your_base64_encoded_service_account_token
OPENOBSERVE_ORG=your_organization_name

Best Practice: Add .env to .gitignore and rotate tokens regularly for security.


Step 3: Training the Anomaly Detection Model

Train a robust machine learning anomaly detection model with these steps:

  • Extract 24+ hours of high-quality, gap-free data
  • Normalize features and create sliding windows
  • Fit the Random Cut Forest model
  • Set anomaly threshold (98th percentile)
  • Validate and package the model

Threshold Tip: Start at 98th percentile to balance sensitivity and reduce false positives.


Step 4: Deployment and Real-Time Inference

Deploy the model using deploy.py and pack.sh for scheduled actions in OpenObserve. Monitor streaming data every minute, flagging anomalies in a dedicated log stream.

action_1_2ad99ad59c.webp action_2_e638c2e938.webp action_3_b8a5a9c2ee.webp


Step 5: Results, Visualization, and Alerts

Post-deployment, visualize anomalies in time series data:

anomaly_detection_dc8932e31c.png real_time_anomaly_detection_732dbde730.webp

Values exceeding the threshold (~230,000 events/minute) are flagged (is_anomaly=true) and trigger real-time anomaly detection alerts.


Troubleshooting Anomaly Detection Systems

Address common anomaly detection challenges in time series:

  • All points flagged as anomalies: Ensure sufficient normal data in training.
  • High false positives: Adjust threshold to 99th percentile.
  • Deployment failures: Verify service account permissions and API endpoints.

Tip: Regularly retrain models to adapt to concept drift for cost-effective anomaly detection solutions.


Conclusion

Implementing real-time anomaly detection with OpenObserve and Random Cut Forest offers scalable, intelligent monitoring for time series data. This approach ensures flexibility, actionable alerts, and minimal manual tuning. Start with our Random Cut Forest implementation guide, monitor performance, and refine thresholds for optimal results in your observability stack.


FAQs about Real-Time Anomaly Detection Systems

Q1: What is real-time anomaly detection in machine learning?
A: It’s the use of algorithms like Random Cut Forest to detect outliers in streaming or time series data instantly, ideal for monitoring and security.

Q2: How does time series anomaly detection work?
A: It models normal trends in time-based data, flagging deviations using statistical or machine learning methods like Random Cut Forest for time series data.

Q3: Can I implement anomaly detection in Python with OpenObserve?
A: Yes, Python libraries (e.g., scikit-learn, PyOD) integrate with OpenObserve for custom real-time anomaly detection using OpenObserve.

Q4: Is Random Cut Forest better than statistical methods?
A: RCF excels in real-time anomaly detection for high-dimensional, streaming data, offering greater flexibility than statistical methods.

Q5: What are common challenges in anomaly detection systems?
A: Issues like false positives or concept drift require careful threshold tuning and regular retraining for troubleshooting anomaly detection systems.

About the Author

Prabhat Sharma

Prabhat Sharma

TwitterLinkedIn

Prabhat Sharma is the founder of OpenObserve, bringing extensive expertise in cloud computing, Kubernetes, and observability. His interests also encompass machine learning, liberal arts, economics, and systems architecture. Outside of work, Prabhat enjoys spending quality time playing with his children.

Latest From Our Blogs

View all posts