Creating Bulk Data
  • 01 Apr 2022
  • 2 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Creating Bulk Data

  • Dark
    Light
  • PDF

Article Summary

You can create a Bulk Data Storage object in any Node in your Tenant where you control the processing code.

Internal Nodes

For Internal Nodes (e.g. - Processor, Bitmap Router, Cross Tenant Sending Nodes) you access Bulk Data Storage via the Context Object's handle_bulk_data method.

When you call this Context method in your processor function, it will store the Bulk Data for you and return a URL that you can store as a "ticket" in your message.

External and Managed Nodes

Direct API invocation

To directly create Bulk Data Storage for External and Managed Nodes, you must follow the following procedure:

  1. Call the EchoStream API Query.GetBulkDataStorage. This will return one or more a BulkDataStorage object(s) to you. A BulkDataStorage object is comprised of a presignedPost, a presignedPut and a presignedGet.

NOTE - if you are transfering bulk data to/from a geographical location that is not in your Tenant's region EchoStream recommends that you set useAccelerationEndpoint to true in your GetBulkDataStorage request. This provides for dramatic speed increases over the default endpoint.

NOTE - the acceleration endpoint does not support CORS. If you are uploading or downloading bulk data from a browser you must use the default endpoint.

  1. Execute an HTTPS POST/PUT request using the contents of the presignedPost/presignedPut.

NOTE - the data that you POST/PUT must match the content encoding that you specified in your Query.GetBulkDataStorage call (i.e. - either gzip or deflate)!

To perform a POST using the Python requests library:

import json
import requests
from io import BytesIO 

# presignedPost from GetBulkDataStorage
presigned_post: dict = ...
# Binary data to store, in a BytesIO buffer
buffer: BytesIO = ...
requests.post(
    presigned_post["url"],
    data=json.loads(presigned_post["fields"]),
    files=dict(file=("bulk_data", buffer))
).raise_for_status()

To perform a PUT using the Python requests library:

import json
import requests
from io import BytesIO 

# presignedPut from GetBulkDataStorage
presigned_put: dict = ...
# Binary data to store, in a BytesIO buffer
buffer: BytesIO = ...
requests.put(
    presigned_put["url"],
    headers=json.loads(presigned_putt["headers"]),
    data=buffer
).raise_for_status()
  1. Use the presignedGet URL as the "ticket" in your message.

Using the echostream-node library

If you're building your External or Managed Node in Python, the easiest way to do this is to use the echostream-node library. This library provides both threaded and aysncio Node classes that you can use or extend.

To create Bulk Data Storage in this case, simply call the Node's method handle_bulk_data, and use the returned URL as the "ticket" in your message. The library handles all of the API steps for you.