Streaming Architecture
ApiTap is designed for high-performance, memory-efficient streaming. This document explains the internal Factory Pattern that allows DataFusion to execute SQL over HTTP streams without redundant requests.
Architecture Overview
HTTP
SourceBuffer
FactoryExec
EngineStore
PostgresHTTP Layer
- 1Single RequestInitiates one persistent connection to the API, avoiding redundant handshakes.
- 2Bounded BufferStores data in a memory-efficient channel that acts as a stream factory.
DataFusion Engine
- 1Multi-Pass ExecutionReads from the same memory pointer multiple times (inference + execution) without re-fetching.
- 2Zero-Copy WriteStreams transformed Arrow batches directly to Postgres or Parquet.
The Factory Pattern
A stream factory is a function that creates streams on demand, not the stream itself. This allows DataFusion to re-scan data (e.g., for schema inference or multiple passes) without re-triggering the HTTP request.
Key Concept: The HTTP request happens ONCE. The response is buffered in a bounded channel. The Factory simply creates readers for that buffer.
Code Example
rust
// Type definition
pub type JsonStreamFactory = Arc<
dyn Fn() -> Pin<Box<dyn Stream<Item = Result<Value>> + Send>>
+ Send + Sync
>;Memory Efficiency
Because we use bounded channels, memory usage is deterministic regardless of the API response size.
| Buffer Size | Memory / Pipeline | 1000 Pipelines |
|---|---|---|
| 8192 items | ~8 MB | 8 GB |
| 256 items✓ Recommended | ~256 KB | 256 MB |
