boto3 streamingBody to BytesIO

boto3 streamingBody to BytesIO

boto3 class into

  • A Session is about a particular configuration. a custom session:
1
2
3
session = boto3.session.Session()
ads = session.client('ads')
ads_s3 = session.resource('s3')
  • Resources is an object-oriented interface to AWS. Every resource instance has a number of attributes and methods. These can conceptually be split up into identifiers, attributes, actions, references, sub-resources, and collections.
1
obj = ads_s3.Object(bucket_name="boto3", key="test.mdf")
  • Client includes common APIs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
copy_object()
delete_object()
create_bucket()
delete_bucket()
delete_objects()
download_file()
download_fileobj()
get_bucket_location()
get_bucket_policy()
get_object()
head_bucket()
head_object()
list_buckets()
list_objects()
put_bucket_policy()
put_object()
upload_file()
upload_fileobj()
  • Service Resource have bucket and object subresources, as well as related actions.

  • Bucket, is an abstract resource representing a S3 bucket.

1
b = ads_s3.Bucket('name')
  • Object, is an abstract resource representing a S3 object.
1
obj = ads_s3.Object('bucket_name', 'key')

read s3 object & pipeline to mdfreader

there are a few try-and-outs. first is to streaming s3 object as BufferedReader, which give a file-like object, and can read(), but BufferedReader looks more like a IO streaming than a file, which can’t seek.

botocore.response.StreamingBody as BufferedReader

the following discussion is really really helpful:
boto3 issue #426: how to use botocore.response.StreamingBody as stdin PIPE

at the code of the StreamingBody and it seems to me that is is really a wrapper of a class inheriting from io.IOBase) but only the read method from the raw stream is exposed, so not really a file-like object. it would make a lot of sense to expose the io.IOBase interface in the StreamingBody as we could wrapped S3 objects into a io.BufferedReader or a io.TextIOWrapper.read() get a binary string . the actual file-like object is found in the ._raw_stream attribute of the StreamingBody class

1
2
3
4
import io
buff_reader = io.BufferedReader(body._raw_stream)
print(buff_reader)
# <_io.BufferedReader>

wheras this buff_reader is not seekable, which makes mdfreader failure, due to its file operate needs seek() method.

steam a non-seekable file-like object

stdio stream to seekable file-like object

so I am thinking to transfer the BufferedReader to a seekable file-like object. first, need to understand why it is not seekable. BufferedRandom is seekable, whereas BufferedReader and BufferedWriter are not. Buffered streams design: BufferedRandom is only suitable when the file is open for reading and writing. The ‘rb’ and ‘wb’ modes should return BufferedReader and BufferedWriter, respectively.

is it possbile to first read() the content of BufferedReader to some memory, than transfer it to BufferedRandom? which gives me the try to BufferedReader.read(), which basicaly read all the binaries and store it in-memoryA, then good news: in-memory binary streams are also aviable as Bytesio objects:

f = io.BytesIO(b"some in-memory binary data")

what if assign BytesIO to this in-memoryA. which really gives me a seekable object:

fid_ = io.BufferedReader(mf4_['Body']._raw_stream) ;
read_in_memory = fid_.read()
bio_ = io.BytesIO(read_in_memory);

then BytesIO object pointer is much more file-like, to do read() and seek().

refer

boto3 doc

boto3 s3 api samples

mdf4wrapper

iftream to FILE

what is the concept behind file pointer or stream pointer

using io.BufferedReader on a stream obtained with open

working with binary data in python

read binary file and loop over each byte

smart_open project

PEP 3116 – New I/O