background
during ADS road test, there are Tb~Pb mount of data generated, e.g. sensor raw data(rosbag, mf4), images, point cloud e.t.c. previous few blogs are focusing on data storage. for a more robost and user-friendly data/file server, also need consider database and UI.
from the end-user/engineers viewpoint, a few basic functions are required:
- query certain features and view the data/files filtered
- download a single/a batch of interested files (for further usage or analysis)
- upload large mount of files quickly to storage (mostly by admin users)
a FTP server to support s3
traditionally, for many large size files to download, FTP is common used. the prons and corns comparing fttp and ftp for transfering files: HTTP is more responsive for request-response of small files, but FTP may be better for large files if tuned properly. but nowadays most prefer HTTP. doing search a little more, there are a lot discussions to connect amazon s3 to ftp server:
transfer files from s3 storage to ftp server
FTP server using s3 as storage
using S3 as storage for attachments in a web-mail system
FTP/SFTP access to amazon s3 bucket
and there are popular ftp clients which support s3, e.g. winSCP, cyberduck, of course, aws has it own sftp client, as well as aws s3 browser windows client), more client tools check here
however, ftp can’t do metadata query. for some cases, e.g. resimulation of all stored scenarios, which makes no difference for each scenario, we can grab one by one and send it to resmiluator; but for many other cases, we need a certain pattern of data, rather than reading the whole storage, then a sql filter is much efficient and helpful. so a simple FTP is not enough in these cases.
s3 objects/files to db
starting from a common bs framework, e.g. react-nodejs, and nodejs can talk to db as well.
** nodejs query buckets/object header info from s3 server, and update these metadata into db.
there is a great disscussion about storing images in db - yea or nay: when manage many TB of images/mdf files, storing file paths in db is the best solution:
- db storage is more expensive than file system storge
- you can super-acc file system access: e.g. os sendfile() system call to asynchronously send a file directly from fs to network interface, sql can’t
- web server need no special coding to access images in fs
- db win out where transactional integrity between image/file and its metadata are important, since it’s more complex to manage integrity between db metdata to fs data; and it’s difficult to guarantee data has been flushed to disk in the fs
so for this file server, the metadata include file-path-in-s3
, and other user interested items.
|
|
** during browser user query/list request, nodejs talk to db, which is a normal bs case.
** when the browser user want to download a certain file, then nodejs parse the file metadata and talk to s3
nodejs to s3
nodejs fs.readFile()
taking an example from official nodejs fs doc:
|
|
if not file directly, maybe fs.Readstream
class is another good choice to read s3 streaming object. fs.readFile()
and fs.readFileSync()
both read full content of the file in memory before returning the data. which means, the big files are going to have a major impact on your memory consumption adn speed of execution of the program. another choice is fs-readfile-promise.
express res.download
res
object represent the HTTP response that an Express app sends when it gets an HTTP request. expressjs res.download
|
|
aws sdk for nodejs
taking an example from aws sdk for js
|
|
check aws sdk for nodejs api for more details.
in summary
either a FTP server or a nodejs server, it depends on the upper usage cases.
a single large-size(>100mb) file(e.g. mf4, rosbag) download, nodejs with db is ok, as db helps to filter out the file first, and a few miniutes download is needed
many of little-size(~1mb) files(e.g. image, json) downlaod, nodejs is strong without doubt.
many of large-size files download/upload, a friendly UI is not necessary, comparing to the performance, then FTP may be the solution.