BigQuery
Arcfeed provides direct access to historical trade data via Google BigQuery. Data is written to
the dataservice dataset in real time — you query it with standard SQL using your own
GCP project and pay only for your own BigQuery compute. No ETL pipelines, no data exports, no
intermediate storage to manage.
BigQuery access is available at $349/month per GCP project (flat fee). Contact [email protected] to get started.
Table reference
| Dataset | dataservice |
| Table | trade_v1 |
| Full reference | your_project.dataservice.trade_v1 |
| Partitioning | DAY on publish_time |
Schema
| Column | Type | Description |
|---|---|---|
price | STRING | Trade price as a decimal string (e.g. "42000.50") — full precision. Cast to BIGNUMERIC for arithmetic. |
quantity | STRING | Trade quantity as a decimal string (e.g. "0.00123") — full precision. Cast to BIGNUMERIC for arithmetic. |
timestamp | BIGNUMERIC | Trade timestamp in Unix nanoseconds. Convert with TIMESTAMP_MICROS(CAST(timestamp / 1000 AS INT64)). |
subscription_name | STRING | Pub/Sub subscription name. Internal — not needed for most queries. |
message_id | STRING | Pub/Sub message ID. Stable across redeliveries — use for deduplication. |
publish_time | TIMESTAMP | Partition column. When the message was published to Pub/Sub. Always filter by this column to avoid full table scans. |
attributes | JSON | Pub/Sub message attributes. Contains source (exchange) and pair (trading pair). Extract with JSON_VALUE(attributes, '$.source'). |
BIGNUMERIC before
doing any arithmetic — not FLOAT64, which loses precision on financial values.SQL examples
Basic query with partition filter
Always include a publish_time filter. Without it, BigQuery scans every partition in
the table.
-- Fetch recent BTC-USDT trades from Binance
-- Always filter by publish_time to use partition pruning
SELECT
JSON_VALUE(attributes, '$.source') AS source,
JSON_VALUE(attributes, '$.pair') AS pair,
price,
quantity,
TIMESTAMP_MICROS(CAST(timestamp / 1000 AS INT64)) AS trade_time,
publish_time
FROM `your_project.dataservice.trade_v1`
WHERE
publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND JSON_VALUE(attributes, '$.source') = 'binance'
AND JSON_VALUE(attributes, '$.pair') = 'BTC-USDT'
ORDER BY publish_time DESC
LIMIT 1000;1-minute OHLCV candles
Aggregate trades into OHLCV bars. timestamp is nanoseconds (BIGNUMERIC) — divide by
1000 before passing to TIMESTAMP_MICROS().
-- 1-minute OHLCV candles for BTC-USDT from Binance
SELECT
TIMESTAMP_TRUNC(
TIMESTAMP_MICROS(CAST(timestamp / 1000 AS INT64)),
MINUTE
) AS candle_open_time,
MIN(CAST(price AS BIGNUMERIC)) AS low,
MAX(CAST(price AS BIGNUMERIC)) AS high,
SUM(CAST(quantity AS BIGNUMERIC)) AS volume,
COUNT(*) AS trade_count
FROM `your_project.dataservice.trade_v1`
WHERE
publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND JSON_VALUE(attributes, '$.source') = 'binance'
AND JSON_VALUE(attributes, '$.pair') = 'BTC-USDT'
GROUP BY candle_open_time
ORDER BY candle_open_time;VWAP calculation
Volume-weighted average price across exchanges. Cast both price and quantity to BIGNUMERIC to preserve precision in the product.
-- VWAP for BTC-USDT across all sources for the last 24 hours
SELECT
JSON_VALUE(attributes, '$.source') AS source,
SUM(CAST(price AS BIGNUMERIC) * CAST(quantity AS BIGNUMERIC))
/ SUM(CAST(quantity AS BIGNUMERIC)) AS vwap,
SUM(CAST(quantity AS BIGNUMERIC)) AS total_volume,
COUNT(*) AS trade_count
FROM `your_project.dataservice.trade_v1`
WHERE
publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND JSON_VALUE(attributes, '$.pair') = 'BTC-USDT'
GROUP BY source
ORDER BY total_volume DESC;Trade count by exchange and pair
Summarise activity across all exchanges and pairs for a given time window.
-- Trade count and volume by exchange and pair for the last hour
SELECT
JSON_VALUE(attributes, '$.source') AS source,
JSON_VALUE(attributes, '$.pair') AS pair,
COUNT(*) AS trade_count,
SUM(CAST(quantity AS BIGNUMERIC)) AS total_quantity
FROM `your_project.dataservice.trade_v1`
WHERE
publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
GROUP BY source, pair
ORDER BY trade_count DESC
LIMIT 50;Deduplication via message_id
Pub/Sub delivers messages at least once. Use message_id to deduplicate if your
analysis requires exactly-once semantics.
-- Deduplicate using message_id
-- Pub/Sub guarantees at-least-once delivery; message_id is stable across redeliveries
SELECT
message_id,
JSON_VALUE(attributes, '$.source') AS source,
JSON_VALUE(attributes, '$.pair') AS pair,
price,
quantity,
TIMESTAMP_MICROS(CAST(timestamp / 1000 AS INT64)) AS trade_time
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY message_id ORDER BY publish_time) AS rn
FROM `your_project.dataservice.trade_v1`
WHERE
publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND JSON_VALUE(attributes, '$.pair') = 'ETH-USDT'
)
WHERE rn = 1
ORDER BY trade_time DESC;Cost tips
- Always filter by
publish_time. It is the partition column. A query without apublish_timefilter scans the entire table. - Cast to
BIGNUMERIC, notFLOAT64. Financial calculations require exact decimal arithmetic.FLOAT64introduces rounding errors on values like0.1. timestampis nanoseconds. Divide by 1000 before callingTIMESTAMP_MICROS()— the function expects microseconds, not nanoseconds.JSON_VALUE()is safe in WHERE. BigQuery evaluates partition pruning before JSON extraction, so filtering onattributesdoes not prevent partition pruning.- GCP includes 1 TB/month free. Most analytical queries on a single day's partition cost cents. Use the BigQuery query validator to preview bytes scanned before running large queries.
Access setup
Access is granted as roles/bigquery.dataViewer on the dataservice dataset. This allows your GCP project to query the table directly — no data is copied to your
project.
Once access is granted, replace your_project in the examples above with your GCP
project ID. No additional configuration is required — query the table directly from BigQuery
Studio, bq CLI, or any GCP data tool.