Snowflake - bem Enterprise Guide

When sending event data from bem to Snowflake, the goal is to create a pipeline that is scalable, reliable, and cost-effective. The best architecture depends on your primary use case for the data. We outline two common patterns below.

Pattern 1: Operational-First via Production Database

This pattern is ideal when the primary, immediate need for the data is to power a live application, such as an internal dashboard or a user-facing feature.

Architecture: bem Event Subscription -> API Gateway -> Lambda -> Production DB (e.g., Postgres) -> CDC/ETL -> Snowflake

How it Works:

The bem webhook sends the event payload to your API Gateway, which triggers a Lambda function.
The Lambda function writes the structured data directly to your production database (e.g., PostgreSQL, MySQL). This makes the data immediately available to your application for real-time workflows.
A separate process, either a periodic ETL job or a Change Data Capture (CDC) stream, then moves the data from your production database to Snowflake for long-term storage and large-scale analytics.

When to use this pattern:

When you need to display results to an operator in an internal tool the moment they are processed.
When the data triggers an immediate transactional workflow in your application's backend.

Pattern 2: Analytics-First via Cloud Storage (Recommended for Analytics)

This event-driven architecture is the industry standard for ingesting webhook data directly into Snowflake for analytics, BI, and data science. It provides the best balance of performance, reliability, and simplicity for analytical workloads.

Architecture: bem Event Subscription -> AWS API Gateway -> AWS Lambda -> S3 Bucket -> Snowpipe -> Snowflake Table

Why this pattern is recommended:

Scalability & Decoupling: S3 acts as a durable, highly-available buffer. If your Snowflake instance is undergoing maintenance or there's a temporary issue with the pipe, event data from bem simply and safely accumulates in S3. Snowpipe will automatically catch up once it's available again, ensuring zero data loss.
Reliability: The Lambda function is simple and stateless. Its only job is to drop the event payload into an S3 bucket. This is a highly reliable, well-understood pattern that is easy to build, monitor, and maintain.
Cost-Effectiveness: Snowpipe is a serverless service billed per-second for the compute resources used. By landing files in S3 first, it can efficiently load micro-batches of data, which is often more cost-effective for analytics workloads than streaming individual rows.

The Best of Both Worlds: A Hybrid Approach

For maximum flexibility, the Lambda function in your integration can perform two actions in parallel:

Write to your Production Database for immediate operational use.
Drop the raw JSON event into S3 for reliable, decoupled ingestion into Snowflake via Snowpipe.

This hybrid pattern serves both your real-time application needs and your long-term analytics requirements without compromise.