Serious Autonomous Vehicles


  • Home

  • Archives

  • Tags

  • Search

real-time matlab/simulink code generation

Posted on 2020-03-20 |

real-time matlab code generation

backgroud

Python and Linux vs Matlab and Windows, I prefer the front. but as a teamwork, I have to understand how matlab/Simulink code generation works, especially with real-time model.

Simulink Coder

previously named as Real-Time workshop(rtw).

real time model data structure

access rtModel data by using a set of macros analogous to the ssSetxxx and ssGetxxx macros that S-functions use to access SimStruct data, including noninlined S-functions compiled by the code generator.

You need to use the set of macros rtmGetxxx and rtmSetxxx to access the real-time model data structure. The rtModel is an optimized data structure that replaces SimStruct as the top level data structure for a model. The rtmGetxxx and rtmSetxxx macros are used in the generated code as well as from the main.c or main.cpp module.

Usage of rtmGetxxx and rtmSetxxx macros is the same as for the ssSetxxx and ssGetxxx versions, except that you replace SimStruct S by real-time model data structure rtM.

rtm macro description
rtmGetdX(rtm) get the derivatives of block continous states
rtmGetNumSampleTimes(RT_MDL rtM) Get the number of sample times that a block has
rtmGetSampleTime(RT_MDL rtM, int TID) Get task sample time
rtmGetStepSize(RT_MDL) Return the fundamental step size of the model
rtmGetT(RT_MDL,t) Get the current simulation time
rtmGetErrorStatus(rtm) Get the current error status

code generation to used externally :

1) Install the Real-Time Workshop (RTW) Toolbox for MATLAB;

2) Create the Simulink Model and Prepare it for autocoding;

3) Correctly configure the RTW options and include a *.tcl file;

4) Build any S-Functions of the model;

5) Build the model (generate autocode including makefile);

6) Tune up the makefile with any missing options/libraries/files;

7) Integrate autocoded model in RTEMS using the wrapper.

ert_main()

the following is a common sample of real-time model generated C code.

1
2
3
4
5
6
7
8
9
10
11
rt_OneStep(void){
simulation_custom_step() ;
}
main(){
simulation_initialize();
while(rtmGetErrorStatus(xx_M) == (NULL)){
rt_OneStep();
}
simulation_terminate();
return 0;

if no time interupt setting up, there is a Warning: The simulation will run forever. Generated ERT main won’t simulate model step behavior. To change this behavior select the ‘MAT-file logging’ option.

from real-time Matlab/Simulink model to C/C++ code, we need manually set the while-loop break statement. for example, we can run 100 times or based on some event-trigger.

Timing

1
2
3
4
5
6
7
8
struct {
uint16_T clockTick1; //base rate counter (5s)
struct {
uint16_T TID[2];
} TaskCounters; //subtask counter (0.02s)
} Timing;
currentTime = Timing.clockTick1 * 5.0 ;

absolute timer for sample time: [5.0s, 0.0s]. the resolution of this integer timer is 5.0, which is the step size of the task. so bascially, assuming to run one step need physcially 5s, but inside the module, each internal step is 0.02s.

if we run a test scenario with 20s, basically we have 4 clockTick1 and inside each clockTick1, we have 250 times internal steps.

a few modification

the following is a few modification based on the auto-generated C code:

  • redefine data structure

matlab coder use most C structure to package signals. in our adas model, most signals have similar inner items, so first I’d like to define a parent structre, then define all other adas signals using typedef:

1
2
3
4
typedef structure parent adas_sig1 ;
typedef structure parent adas_sig2 ;
typedef structure parent adas_sig3 ;
typedef structure parent adas_sig4 ;

with the parent struct, we can define one method to handle all adas signals.

  • add trigger model in rt_oneStep()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
checkTriggerSigs(&FunctionSignal, outputs_name);
int idx = 0;
for(; idx<outputs_name.size(); idx++){
if ( outputs_name[idx] == "adas_sig1") {
double *vals = gd1_struc_2arr(adas_ig1) ;
outputs_data.push_back(als);
}
else if( outputs_name[idx] == "adas_sig2") {
double *vals = gd1_struc_2arr(adas_sig2) ;
outputs_data.push_back(vals);
}
// ws msg send model
  • add sim_time to break the loop
1
2
3
4
5
real_T sim_time = 5.0 ;
fflush((NULL));
while (rtmGetErrorStatus(OpenLoopSimulation_M) == (NULL) && (Timing.clockTick1) * 5.0 <= sim_time) {
rt_OneStep();
}

in summary

the work above is the core modifcation to make real-time matlab/simulink model with trigger model translated to C/C++ code, which can be integrated in massively adas test pipeline.

refer

matlab code generation from rtems

the joy of generating C code from MATLAB

matlab coder introduction

matlab code doc

real-time model data structure

generate code from rate-based model

schedule a subsystem multiple times in a single step

massively adas test pipeline

Posted on 2020-03-18 |

background

ADAS engineers in OEM use Matlab/Simulink(model based design) to develop the adas algorithms, e.g. aeb. Simulink way is enough for fuction verification in L2 and below; for L2+ scenario, often need to test these adas functions in system level, basically test it in scenarios as much as possible, kind of L3 requirements.

one way to do sytem level verification is through replay, basically the test vehicle fleet can collect large mount of data, then feed in a test pipeline, to check if these adas functions are triggered well or missed.

for replay system test, we handle large mount of data, e.g. Pb, so the Simulink model run is too slow. the adas test pipeline with the ability to run massively is required.

the previous blog high concurrent ws server mentioned about the archetecture of this massively adas test pipeline: with each adas model integrated with a websocket client, and all of these ws clients talk to a websocekt server, which has api to write database.

encode in C

the adas simulink model can be encode in c, which of course can encode as c++, while not powerful yet. as in C is more scalable then simulink/matlab.

matlab/simulink encode has a few choices, massively run is mostly in Linux-like os, here we choose ert env to encode the model, after we can build and test as:

1
2
3
gcc -c adas_model.c -I .
gcc -c ert_main.c
gcc ert_main.o adas_model.c -o mytest

json-c

as all messages in adas model in c is stored in program memory, first thing is to serialize these to json. here we choose json-c:

  • install on local Ubuntu
1
2
3
4
sudo apt-get install autoconf automake libtool
sh autogen.sh
./configure
make && make install

then the json-c header is at:

/usr/local/include/json-c

and the libs at:

/usr/local/lib/libjson-c.so *.al

when using we can add the following flags:

1
2
3
JSON_C_DIR=/path/to/json_c/install
CFLAGS += -I$(JSON_C_DIR)/include/json-c
LDFLAGS+= -L$(JSON_C_DIR)/lib -ljson-c

the json object can be created as:

1
2
3
4
5
6
7
8
9
10
struct json_object *json_obj = json_object_new_object();
struct json_object *json_arr = json_object_new_array();
struct json_object *json_string = json_object_new_string(name);
int i=0;
for(; i<20; i++){
struct json_object *json_double = json_object_new_double(vals[i]);
json_object_array_put_idx(json_arr, i, json_double);
}
json_object_object_add(json_obj, "name", json_string);
json_object_object_add(json_obj, "signals", json_arr);

modern c++ json libs are more pretty, e.g. jsoncpp, rapidJSON, json for modern c++

wsclient-c

the first ws client I tried is: wsclient in c, with default install, can find the headers and libs, separately at:

1
2
/usr/local/include/wsclient
/usr/local/lib/libwsclient.so or *.a

when using:

1
gcc -o client wsclient.c -I/usr/local/include -L/usr/local/lib/ -lwsclient

onopen()

as we need send custom message through this client, and the default message was sent inside onopen(), I have to add additionaly argument char\* into the default function pointer onopen

1
2
3
4
5
6
7
8
9
10
11
12
int onopen(wsclient *c, char* message) {
libwsclient_send(c, message);
return 0;
}
void libwsclient_onopen(wsclient *client, int (*cb)(wsclient *c, char* ), char* msg) {
pthread_mutex_lock(&client->lock);
client->onopen = cb;
pthread_mutex_unlock(&client->lock);
}
if(pthread_create(&client->handshake_thread, NULL, libwsclient_handshake_thread, (void *)client)) {

and the onopen callback is actually excuted inside handshake thread, in which is not legacy to pass char* message. further as there is no global alive status to tell the client-server channel is alive, to call libwsclient_send() in another thread sounds trouble-possible.

looks wsclient-c is limited, I transfer to wsclient c++. but need make sure model in c workable with g++.

wsclient c++

websocketpp is a header only C++ library, there is no libs after built, but it is depends on boost, so to use this lib, we can add header and libs as following:

1
2
/usr/local/include/websocketpp
/usr/lib/x86_64-linux-gnu/libboost_*.so

I am using the wsclient from sample, and define a public method as the client process:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
int client_process(std::string& server_url, std::string& message){
websocket_endpoint endpoint;
int connection_id = endpoint.connect(server_url);
if (connection_id != -1) {
std::cout << "> Created connection with id " << connection_id << std::endl;
}
connection_metadata::ptr mtdata = endpoint.get_metadata(connection_id);
//TODO: optimized with this sleeping time
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
int retry_num = 0;
while(mtdata->get_status() != "Open" && retry_num < 100){
std::cout << "> connected is not open " << connection_id << std::endl;
boost::this_thread::sleep(boost::posix_time::milliseconds(100));
connection_id = endpoint.connect(server_url);
mtdata = endpoint.get_metadata(connection_id);
}
if(mtdata->get_status() != "Open") {
std::cout << "retry failed, exit -1" << std::endl ;
return 0;
}
endpoint.send(connection_id, message);
std::cout << message <<" send successfully" << std::endl ;

there is more elegent retry client solution.

to build our wsclient:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
g++ wsclient.cpp -o ec -I/usr/local/include -L/usr/lib/x86_64-linux-gnu -lpthread -lboost_system -lboost_random -lboost_thread -lboost_chrono
```
## ws server in python
we implemented a simple ws server with [websockets]():
```python
#!/usr/bin/env python3
import asyncio
import websockets
import json
from db_ model import dbwriter, adas_msg, Base
class wsdb(object):
def __init__(self, host=None, port=None):
self.host = host
self.port = port
self.dbwriter_ = dbwriter()
async def process(self, websocket, path):
try:
raw_ = await websocket.recv()
jdata = json.loads(raw_)
orm_obj = adas_msg(jdata)
try:
self.dbwriter_.write(orm_obj)
self.dbwriter_.commit()
except Exception as e:
self.dbwriter_.rollback()
print(e)
except Exception as e:
print(e)
greeting = "hello from server"
await websocket.send(greeting)
print(f"> {greeting}")
def run(self):
if self.host and self.port :
start_server = websockets.serve(self.process, self.host, self.port)
else:
start_server = websockets.serve(self.process, "localhost", 8867)
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()
if __name__=="__main__":
test1 = wsdb()
test1.run()

the simple orm db_writer is from sqlalchemy model.

in makefile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
CC=g++
JSONC_IDIR=/usr/local/include
CFLAGS=-I. -I$(JSONC_IDIR)
OPEN_LOOP_DEPS= rtwtypes.h adas_test.h
LDIR=-L/usr/local/lib/
LIBS=-ljson-c
BOOST_LDIR=-L/usr/lib/x86_64-linux-gnu
BOOST_LIBS= -pthread -lboost_system -lboost_random -lboost_thread -lboost_chrono
JSON_DEPS= wsclientpp.h
obj = adas_test.o ert_main.o
src=*.c
$(obj): $(src) $(DEPS) $(JSON_DEPS)
$(CC) -c $(src) $(CFLAGS)
mytest: $(obj)
$(CC) -o mytest $(obj) $(CFLAGS) $(LDIR) $(LIBS) $(BOOST_LDIR) $(BOOST_LIBS)
.PHONY: clean
clean:
rm -f *.o

so now we have encode adas simulink model to C code, and integrate this model C with a websocket client, which can talk to a ws server, which further write to database, which further can be used an data analysis model.

we can add front-end web UI and system monitor UI if needed, but so far this adas test pipeline can support a few hundred adas test cores to run concurrently.

refer

pthread_create with multi args

wscpp retry client

stanford cs linked list problems

Posted on 2020-03-14 |

background

stanford cs gives me some confidence to deal with linked list.

linked list basics

basic pointers as well as previous blog:

  • a pointer stores a reference to another variable(pointee). what stored inside pointer is an reference to its pointee’s type.

  • NULL pointer, points to nothing.

  • pointer assignment p=q, makes the two pointers point to the same pointee, namely the two pointers point to the same pointee memory.

it’s a good habit to remember to check the empty list case.

define the Linked-List structure:

1
2
3
4
5
6
struct ListNode {
int val ;
ListNode *next;
ListNode(int x): val(x), next(NULL){}
};
typedef ListNode node_ ;
  • iterate over the list with a local pointer
1
2
3
4
5
6
node_ *current = head ;
while(current){
current = current->next;
}
for(current=head; current!=NULL; current=current->next){};
  • push a new node to the front of the list
1
2
3
4
5
6
7
8
9
10
11
12
13
void Push(ListNode** headRef, int val){
ListNode newNode = new ListNode(val);
newNode.next = *headRef ;
*headRef = newNode ;
}
```
* changes the head pointer by a reference pointer
```c++
void changetoNull(ListNode** head){
*head = NULL;
}
  • build up a list by adding nodes at its head
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
ListNode* AddatHead(){
ListNode *head = NULL ;
for(int i=1; i<5; i++){
Push(&head, i);
}
return head;
}
```
which gives a list {4, 3, 2, 1}
* build up a list by appending nodes to the tail
```c++
ListNode* BuildwithSpecialCase(){
ListNode* head = NULL ;
ListNode* tail ;
Push(&head, 1);
tail = head ;
for(int i=2; i<5; i++){
Push(&(tail->next), i);
tail = tail->next;
}
return head;
}
```
which gives a list {1, 2, 3, 4}
* build up a list with dummy node
```c++
ListNode* BuildWithDummy(){
ListNode dummy = new ListNode(0);
ListNode* tail = &dummy ;
for(int i=1; i<5; i++){
Push(&(tail->next), i);
tail = tail->next;
}
return dummy.next ;
}

which returns a list {1, 2, 3, 4}

  • appendNode(), add new node at the tail
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
ListNode* appendNode(ListNode** headRef, int val){
ListNode *current = *headRef ;
ListNode newNode = new ListNode(num);
if(!current){
current = &newNode ;
}else{
while(current->next){
current = current->next ;
}
current->next = &newNode ;
}
}
```
* copyList()
```c++
ListNode* copyList(ListNode* head){
ListNode *current = head ;
ListNode *newList = NULL ;
ListNode *tail = NULL ;
while(current){
if(!newList){
newList = &(new ListNode(current->val));
tail = newList ;
}else{
tail->next = &(new ListNode(current->val));
tail = tail->next ;
}
}
return newList ;
}
  • copyList() recursive
1
2
3
4
5
6
7
8
ListNode* CopyList(ListNode* head){
if(!head) return NULL;
else{
ListNode *current = head ;
ListNode *newList = &(ListNode(current->val));
newList->next = CopyList(current->next);
}
}

linked list problems further

  • InsertNth()
1
2
3
4
5
6
7
8
9
10
void insertNth(node_ **head, int index, int data){
if(index==0) Push(head, data);
else{
node_ * cur = *head ;
for(int i=0; i<index-1; i++){
cur = cur->next ;
}
Push(&*cur->next), data);
}
}
  • sortedInsert()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// be notified, here we using **, to use the pointer of the head, as the head node may be updated
void sortedInsert(node_ **head, node* newNode){
if(*head == NULL || head->val >= newNode->val){
newNode->next = *head;
*head = newNode;
}else{
node_ *cur = *head ;
while(cur->next && cur->next->val < newNode->val){
cur = cur->next;
}
newNode->next = cur->next ;
cur->next = newNode ;
}
}
//with dummy head
void sortedInsert(node_ **head, node* newNode){
node_ dummy(0);
dummy.next = *head ;
node_ *cur = &dummy ;
while(cur->next && cur->next->val < newNode->val){
cur = cur->next;
}
newNode->next = cur->next ;
cur->next = newNode ;
*head = dummy->next;
}
  • insertSort()

given as unsorted list, and output with an sorted list

1
2
3
4
5
6
7
8
9
10
11
12
void InsertSort(node_ ** head){
node_ *result = NULL ;
node_ *cur = *head ;
node_ *next ;
while(cur){
next = cur->next ; // ticky- note the next pointer, before we change it
sortedInsert(result, cur);
cur = next ;
}
*head = result ;
}
  • append()

append list b to the end of list a

1
2
3
4
5
6
7
8
9
10
11
12
void append(node_ **a, node_ **b){
node_ * cur ;
if( *a == NULL){ *a = *b ;}
else{
cur = *a ;
while(cur->next){
cur = cur->next;
}
cur->next = *b ;
}
*b = NULL ;
}
  • frontBackSplit()

given a list, split it into two sublists: one for the front half, one for the back half

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
void frontBackSplit(node_ **head, node_ **front, node_ **back){
int len = length(head);
node_ *cur = *head ;
if(len < 2){
*front = *head;
*back = NULL ;
}else{
int half = len/2;
for(int i=0; i<half-1; i++){
cur = cur->next;
}
*front = *head;
*back = cur;
cur = NULL ;
}
}
void frontBackSplit2(node_ **head, node_ **front, node_ **back){
node_ *fast, *slow ;
if(*head == NULL || (*head)->next == NULL){
*front = *head ;
*back = NULL ;
}else{
slow = head;
fast = head->next ;
while(fast){
fast = fast->next;
if(fast){
fast = fast->next;
slow = slow->next;
}
}
*front = *head;
*back = slow->next ;
slow->next = NULL ;
}
}
  • removeDuplicates()

remove duplicated node from a sorted list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void removeDuplicates(node_ ** head)
{
node_ *slow, *fast ;
if(head == NULL || head->next == NULL){
return ;
}
slow = *head ;
fast = (*head)->next ;
while(fast){
if(slow->val == fast->val){
node_ * needfree = fast ;
fast = fast->next ;
free(needfree);
}else{
slow = slow->next;
fast = fast->next;
}
}
}
void removeDuplicate(node_ **head){
node_ *cur = *head;
if(cur==NULL) return ;
while(cur->next){
if(cur->val == cur->next->val){
node_ *nextNext = cur->next->next ;
free(cur->next);
cur->next = nextNext;
}else{
cur = cur->next;
}
}
}
  • moveNode()

given two list, remove the front node from the second list, and push it onto the front of the first.

// a = {1, 2, 3}; b = {1, 2, 3} => a={1, 1, 2, 3}, b={2, 3}

1
2
3
4
5
6
void moveNode(node_ **dest, node_ **source){
node_ *newNode = *source ;
*source = newNode->next ;
newNode->next = *dest ;
*dest = newNode ;
}
  • alternatingSplit()

given a list, split its nodes into two shorter lists. if we number the elements 0, 1, 2, …, then all the even elements go to the first sublist, and all the odd elements go to tthe second

1
2
3
4
5
6
7
8
9
10
11
12
13
void alternatingSplit(node_ *source, node_ **ahead, node_ **bhead){
node_ *a = NULL ;
node_ *b = NULL ;
node_ *cur = *source ;
while(cur){
moveNode(&a, &cur);
if(cur){
moveNode(&b, &cur);
}
}
*ahead = a ;
*bhead = b;
}
  • shuffleMerge()

given two list, merge their nodes together to make one list, takign nodes alternatively between the two lists

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
node_* shufflMerge(node_ *a, node_ *b){
node_ *res = NULL ;
int i=0;
while(a || b){
if(i % 2 == 0 && a){
moveNode(&res, &a);
}else if(b){
moveNode(&res, &b);
}
}
} //but this gives the tail to front order
node_* shufflMerge2(node_ *a, node_ *b){
node_ dummy(0);
node *tail = &dummy ;
while(1){
if(a==NULL){
tail->next = b;
break ;
}else if(b == NULL){
tail->next = a;
break;
}else{
tail->next = a ;
tail = a ;
a = a->next ;
tail->next = b;
tail = b;
b = b->next;
}
}
return dummy.next ;
}
// recursive ?
node_* shufflMerge3(node_ *a, node_ *b){
node_ *res ;
node_ *recur ;
if(a==NULL) return b ;
else if(b=NULL) return a ;
else{
recur = shuffleMerge3(a->next, b->next);
res = a ;
a->next = b ;
b->next = recur ;
return res;
}
}
  • sortedMerge()

given two sorted in incresing order listes, merge into one in increasing order

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
node_ *sortedMerge(node_ *a, node_ *b){
node_ dummy(0);
node_ *tail = &dummy ;
dummy.next = NULL ;
while(1){
if(a==NULL){
tail->next = b ;
break;
}else if(b==NULL){
tail->next = a;
break ;
}
if(a->val <= b->val){
moveNode(&(tail->next), &a);
}else{
moveNode(&(tail->next), &b);
}
tail = tail->next;
}
return dummy.next ;
}
// how this works?
node_ *sortedMerge2(node_ *a, node_ *b){
node_ *result = NULL ;
if(a==NULL) return b;
if(b==NULL) return a;
if(a->val <= b->val){
result = a;
result->next = sortedMerge2(a->next, b);
}else{
result = b;
result->next = sortedMerge2(a, b->next);
}
return result;
}
  • mergeSort()
1
2
3
4
5
6
7
8
9
10
11
12
13
void mergeSor(node_ ** headRef){
node_ *head = *headRef ;
node_ *a, *b;
if( (head==NULL) || (head->next == NULL)){
return ;
}
frontBacksplit(head, &a, &b);
mergeSort(&a);
mergeSort(&b);
*headRef = sortedMerge(a,b):
}
  • reverse()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void reverse(node_ **head){
node_ *res = NULL;
node_ *cur = *head ;
node_ *next ;
while(cur){
next = cur->next;
cur->next = res ;
res = cur ;
cur = next ;
}
*head = res;
}
  • recursive reverse()

// concerned

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void recursiveReverse(node_ **head){
ndoe_ *first, *rest ;
if(*head == NULL) return ;
first = *head ;
rest = first->next ;
if(rest == NULL) return ;
recursiveReverse(&rest);
first->next->next = first ;
first->next = NULL ;
*head = rest ;
}

reference

linked list problems

linked list basics from standford cs

hypervisor and gpu virtualization

Posted on 2020-03-14 |

background

to build a private data center for ads training and SIL. a few concepts there:

linux remote desktop

Linux GUI is based on x-protocol, which is kind of CPU based, during deploying gpu-based app in docker swarm, I have studied xserver, ssh -X channels give the way for visualization desktop and simple apps(e.g. glxgears) among xserver and xclient. while for really gpu-densive app, e.g. game rendering, machine learning, ssh -X can’t really use gpu resource well, which needs further support from device management mechanism from docker or k8s.

virutal machine/ cloud desktop

VM is based on hypervisor, known as a virtual machine monitor(VMM), a type of virtualization software that supports the creation and management of VMs, hypervisor translate requests between the physical and virtual resources, making virtualization possible.

A hypervisor allows one host computer to support multiple guest VMs by virtually sharing its resources, like memory and processing.

Generally, there are two types of hypervisors. Type 1 hypervisors, called “bare metal,” run directly on the host’s hardware. Type 2 hypervisors, called “hosted,” run as a software layer on an operating system, like other computer programs, the most common e.g. vmware, citrix.

When a hypervisor is installed directly on the hardware of a physical machine, between the hardware and the operating system (OS), it is called a bare metal hypervisor, which separates the OS from the underlying hardware, the software no longer relies on or is limited to specific hardware devices or drivers.

VDI

a type of vm application is virtual desktop inftrastructure(vdi), VDI hosts desktop environments on a centralized server and deploys them to end-users on request. this process also known as server virtualization

VDI is not necessarily requires a physical gpu, without a gpu, we can still run vmware, but the performance is poor. but for many other VM usage, beyond VDI, we still need ways to access GPU devices in VM.

GPU techs in VM

vwmare has an introduction about gpu tech in vm:

  • software 3D. using software to mimic GPU calculating. all 3d calculating is done by CPU

  • vsga(vitual shared graphics acceleration), each virtual desktop has a vsga driver, which has a ESXi module, inside which has a GPU driver, which can call the physical gpu, but this shared mode is through Hypervisor

image

  • vdga(virtual dedicated graphics accleration), or pass through mode, the physical GPU is assigned to a special virtual desktop for one user only.

image

  • vgpu, split physical GPU to a few virtual-GPU, each virtual desk can have one vGPU.

image

VM vs docker

docker is a light-weight virtualization way, which don’t need hypervisor, and the app in docker is access the host physical devices directly, making docker looks like a process, rather than a virtual OS; and scientfic computing apps run in VM is actually using the virtual CPU from hypervisor, which bypass the cpu optimzation used for math calcualting. so usually a physical machine can start hundres or thousands docker, but can only run a few VM.

refer

linux remote desktop

understand vdi gpu tech

nvidia vgpu tech in vSphere

GPU hypervisors on OpenStack

containers vs hypervisors

pure c/c++ pointer

Posted on 2020-03-13 |

pointer assignment

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
int main(){
int a = 10 ;
int *pa = &a ;
cout << "pa add " << &pa << "pa val " << *pa << endl ;
int b = 22;
int *pb = &b ;
cout << "pb add " << &pb << "pb val " << *pb << endl ;
int d = 100 ;
int *pd = &d ;
cout << "pd add " << &pd << "pd val " << *pd << endl ;
int *pc = pb ; //pointer assignment
cout << "pc add " << &pc << "pc val " << *pc << endl ;
pb = pd ;
cout << "pb add " << &pb << "pb val " << *pb << endl ;
cout << "pd add " << &pd << "pd val " << *pd << endl ;
pd = pa ;
cout << "pd add " << &pd << "pd val " << *pd << endl ;
pa = pc ;
cout << "pa add " << &pa << "pa val " << *pa << endl ;
cout << "pc add " << &pc << "pc val " << *pc << endl ;
return 0;
}
/*
pb add 0x7fff5e744b30pb val 22
pd add 0x7fff5e744b28pd val 100
pc add 0x7fff5e744b20pc val 22
pb add 0x7fff5e744b30pb val 100
pd add 0x7fff5e744b28pd val 100
pd add 0x7fff5e744b28pd val 10
pa add 0x7fff5e744b40pa val 22
pc add 0x7fff5e744b20pc val 22
*/

pointer assigment used a lot in linked list problems, the sample above is a pointer solution for linked list reverse. consider pointer as a container with an address to another object. pointer assignment, e.g. pointerA = pointerB only changes the content in the container. but the address of the container itself doesn’t change. and with *(dereference) the pointer, we can see the content is changed.

further, taken pc, pb, pd as another example.

1
2
pc = pb ;
pb = pd;

the first line will make container pc to store what is stored in container pb, in another word, the first line will make pc point to the address, which is stored in pb.

and the second line will then put what’s stored in container pd to container pb.

after this two updates, pc points to the original content in pb; pb and pd points to the same content. obviously, what’s inside pc now, has nothing to do with pointer pb.

pointer++ forward

the basic scenario is as following, will p2 move forward as well ?

1
2
3
int *p1 = &int_var ;
int *p2 = p1;
p1++ ;

we can see from the following test.c, p2 won’t move forward as p1++.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
int main(){
int a[4] = {1, 2, 3, 4 } ;
cout << " a addr " << &a << " a val " << *a << endl ;
int *p = a ;
int *q = p ;
cout << " p addr " << &p << " p val " << *p << endl ;
cout << " q addr " << &q << " q val " << *q << endl ;
for(int i=0; i<3; i++){
q++;
}
cout << " a addr " << &a << " a val " << *a << endl ;
cout << " p addr " << &p << " p val " << *p << endl ;
cout << " q addr " << &q << " q val " << *q << endl ;
return 0;
}
/*
a addr 0x7fff5e968bb0 a val 1
p addr 0x7fff5e968b40 p val 1
q addr 0x7fff5e968b38 q val 1
a addr 0x7fff5e968bb0 a val 1
p addr 0x7fff5e968b40 p val 1
q addr 0x7fff5e968b38 q val 4
*/

pointer++ can be reviewed as a same type pointer in current pointer’s neighbor, then do a pointer assigment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int* tmp_p = 0x7fff5e744b30 ;
int* p1 = 0x7fff5e744b28 ;
p1 = tmp_p ;
```
### int++
```c++
int main(){
vector<int> nums ;
for(int i=0; i<5; i++){
nums.push_back(i);
}
int p=0;
std::cout << "nums[p++] " << nums[p++] << " p val " << p << endl ;
return 0;
}
/* nums[p++]: 0 p val: 1 */

nodejs introduction

Posted on 2020-03-12 |

callback / asnyc

1
2
3
4
5
6
var fs = require("fs") ;
fs.readFile("input.txt", function(err, data){
if(err) return console.error(err);
console.log(data.toString());
})

the async fun take the callback fun as its last parameter.

event loop

1
2
3
4
5
6
7
8
9
10
11
var events = require('events')
var emitter = new events.EventEmitter();
var connectH = function connected(){
console.log("connected")
emitter.emit('data_received'); // trig 'data_received'
}
emitter.on('connection', connectH);
emitter.on('data_received', function(){
console.log('data received');
})
emitter.emit('connection'); // trig 'connection'

event emitter

when async IO is done, will send an event to the Event queue. e.g. when fs.readStream() open a file will trig an event. e.t.c

1
2
3
addListener(event, listener)
on(event, listener) #listening
emit(evnet, [arg1], ...) #trig

file system

1
2
3
4
5
6
7
8
9
10
11
12
13
14
var fs = require("fs")
fs.open("input.log", "r+", function(err, fd){});
fs.state("input.log", function(err, stats_info){});
fs.readFile("input.log", function(err, data){
if(err){
return console.error(err);
}
console.log(data.toString());
});
fs.writeFile("output.log", function(err){
if(err){ return console.error(err);}
console.log("write successfully")
});
fs.read(fd,buffer, [args..]) #read binary stream

buffer

as js language has only txt bytes data, to deal with binary data, introduce Buffer

1
2
3
4
5
Buffer.alloc(size)
Buffer.from(buffer||array||string)
Buffer.write() #write to buffer
buffer.toString() #read from buffer
buffer.toJSON()

stream

1
2
3
4
5
6
var fs=require("fs");
var readerStream = fs.createReadstream("input.file")
readerStream.on('data', function(chunk, err){})
var writeStream = fs.createWriteStream("output.file")
writeStream.on('finished', function(){})
readerStream.pipe(writeStream); #pipe from a reader stream to a writer stream

module system

to enable different nodejs files can use each other, there is a module system, the module can be a nodejs file, or JSON, or compiled C/C++ code.

nodejs has exports and require used to export modules’ APIs to external usage, or access external APIs.

1
2
module.exports = function(){}
exports.method = function(){}

the first way export the object itself, the second way only export the certain method.

Global Object

1
2
console.log()
console.error()

common modules

  • path
1
2
3
var path = require("path");
path.join("/user/", "test1");
path.dirname(p_);
  • http server
1
2
3
4
5
6
7
8
9
10
11
12
13
14
var http = require("http");
http.createServer(function(request, response){
var url_path = request.url ;
server(url_path, function(err, data){
if(err){
console.log(err);
response.writeHead(404, "xx");
}else{
response.writeHeead(200, "yy");
response.write(data.toString());
}
response.end();
});
}).listen(8080);
  • http client
1
2
3
4
5
6
7
8
9
10
11
12
var http = require('http')
url = "http://localhost:8080/index.html"
var req = http.request(url, callback);
var callback = function(response){
var body = '' ;
response.on('data', function(data){
body += data;
});
response.on('end', function(){
console.log(body);
});
};

Express

Express has requst and response object to handle request and reponse. express.static can handle static resources, e.g. image, css e.t.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
var express = require("repress");
var app = express();
app.use('/public', express.static('public'));
app.get('/index', function(req, res){})
app.get('/user', function(req, res){
var response = {
"name": req.query.name,
"id": req.query.id
};
res.send(JSON.stringify(response));
});
app.post('/user', function(req, res){
var response = {
"name" : req.body.name ;
"id" : req.body.id
};
res.send(JSON.stringify(response));
});
app.post('file_upload', function(req, res){
var des_file = __dirname + "/" + req.files[0].originalname ;
fs.readFile(req.files[0].path, function(err, data){
fs.writeFile(des_file, data, function(err){
if(err){console.error(err);}
else{
var response = {
message: "file uploaded successfully" ,
filename: req.files[0].originalname
};
}
res.send(JSON.stringify(response));
});
});
})
var server = app.listen(8080, function(){})

res is what send from server to client, for both /get, /post methods. req object represents the HTTP request and has properties for the request query string, parameters, body, HTTP headers e.t.c

  • req.body

contains key-value pairs of data submitted in the request body. by default, it’s undefined, and is populated when using body-parsing middleware. e.g. body-parser

  • req.cookies

when using cookie-parser middleware, this property is an object that contains cookies send by the request

  • req.path

contains the path part of the request url

  • req.query

an object containing a property for each query string parameter in the route

  • req.route

the current mathced route, a string

data access object(DAO)

Dao pattern is used to separate low level data accessing API or operations form high level business services. usually there are three parts:

  • DAO interface, which defines the standard operations to be performed on a model object

  • DAO class, the class that implement DAO interfaces, this class is responsible to get data from database, or other storage mechanism

  • model object, a simple POJO containing get/set methods to store data retrieved using DAO class

o/r mapping (orm) is used a lot to map database itme to a special class, and it’s easy to use, but a little drawback of orm is it assume the database is normalized well. DAO is a middleware to do directly SQL mapping, who mapes SQL query language to the output class.

separting models, logic and daos

  • routes.js, where to put routes, usually referenced as controllers

  • models.js, where to put functions talk to database, usually referenced as dao layer

  • views.js

these three components can put under app; all static data usually put under public folder; the Express package.json and index.js are at the same level as app.

refer

nodejs at runoob.com

nodejs & mysql

bearcat-dao introduction

koa

chokidar

high concurrent ws log server

Posted on 2020-03-11 |

backgroud

previously, we designed a logger module from ws to db/, this blog is one real implementation of the python solution.

high concurrent ws server online

single server with multi clients: a simple C++

  • the server is started first, and wait for the incoming client calls, periodically report its status: how many clients keep connected. meanwhile, once an incoming call is detected and accepted, the server will create a separate thread to handle this client. therefore, the server creates as many separate sessions as there are incoming clients.

  • how handle multiple clients? once an incoming client call is received and accepted, the main server thread will create a new thread, and pass the client connection to this thread

  • what if client threads need access some global variables? a semaphore instance ish helpful.

how ws server handles multiple incomming connections

  • socket object is often thought to represent a connection, but not entirely true, since they can be active or passive. a socket object in passive/listen mode is created by listen() for incoming connection requets. by definition, such a socket is not a connection, it just listen for conenction requets.

  • accept() doesn’t change the state of the passive socket created by listen() previously, but it returns an active/connected socket, which represents a real conenction. after accept() has returned the connected socket object, it can be called again on the passive socket, and again and again. or known as accept loop

  • But call accept() takes time, can’t it miss incoming conenction requests? it won’t, there is a queue of pending connection requests, it is handled automatically by TCP/IP stack of the OS. meaning, while accept() can only deal with incoming connection request one-by-one, no incoming request will be missed even when they are incoming at a high rate.

python env setup

websockets module requires python3.6, the python version in bare OS is 3.5, which gives:

1
2
3
4
5
6
7
8
9
10
File "/usr/lib/python3/dist-packages/websockets/compatibility.py", line 8
asyncio_ensure_future = asyncio.async # Python < 3.5
^
SyntaxError: invalid syntaxß
```
basically, [asyncio.async](https://stackoverflow.com/questions/51292523/why-does-asyncio-ensure-future-asyncio-async-raise-a-syntaxerror-invalid-synt) and another depend module [discord.py](https://github.com/Rapptz/discord.py/issues/1396), require python < 3.5v, then gives another error:
```python
TypeError: unsupported operand type(s) for -=: 'Retry' and 'int'

the Retry error fixed by adding the following lines to ~/.pip/pip.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
```
in a bare system with pre-installed python3.5, I did the following steps:
```python
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get install python3.7
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 2
sudo apt-get install python3-websockets
sudo apt-get install python3-websocket
sudo apt-get install python3-sqlalchemy
sudo pip3 install threadpool
sudo apt-get remove --purge python3-websocket #based on python3.5
sudo apt-get install python3-websocket #based on python3.7

which gives another error:

1
2
3
4
5
6
/var/lib/dpkg/info/python3-websocket.postinst: 6: /var/lib/dpkg/info/python3-websocket.postinst: py3compile: not found
dpkg: error processing package python3-websocket (--configure):
subprocess installed post-installation script returned error exit status 127
Errors were encountered while processing:
python3-websocket
E: Sub-process /usr/bin/dpkg returned an error code (1)

which then be fixed by go to /var/lib/dpkg/info/ and delete all python3-websocket.* files:

1
2
3
sudo rm /var/lib/dpkg/info/[package_name].*
sudo dpkg --configure -a
sudo apt-get update

everything looks good, but still report:

1
ModuleNotFoundError: No module named 'websockets'

Gave up setting up with the bare python, then create a new conda env, and ran the following settings inside, clean and simple:

1
2
3
4
5
pip install websockets
pip install websocket-client #rather websocket
pip install threadpool
pip install sqlalchemy
pip install psycopg2

during remote test, if ws server down unexpected, need kill the ws pid:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
sudo netstat -tlnp #find the special running port and its pid
kill $pid
```
or `kill $(lsof -t -i:$port)`
## websockets & websocket-client
the long-lived connection sample from [ws-client](https://github.com/websocket-client/websocket-client) is good base, as here need to test a high concurrent clients, we add `threadpool`:
```python
def on_open(ws, num):
def run(*args): ##args
for i in range(3):
time.sleep(1)
message = "your_message"
ws.send(json.dumps(message))
time.sleep(1)
ws.close()
print("thread terminating...")
thread.start_new_thread(run, ())
def on_start(num):
websocket.enableTrace(True)
ws = websocket.WebSocketApp("ws://localhost:8888/",
on_message = on_message,
on_error = on_error,
on_close = on_close)
ws.on_open = on_open(ws, num)
ws.run_forever()
def threadpool_test():
start_time = time.time()
pool = ThreadPool(100)
test = list()
for itr in range(100):
test.append(itr)
requests = makeRequests(on_start, test)
[pool.putRequest(req) for req in requests]
pool.wait()
print('%d second'% (time.time() - start_time))

in ws-client src, we see:

  • on_open: callable object which is called at opening websocket. this function has one argument. The argument is this class object. but all customized callback func can add more arguments, which is helpful.

  • on_message: callable object which is called when received data. on_message has 2 arguments. The 1st argument is this class object. the 2nd argument is utf-8 string which we get from the server.

we can implement a simple sqlalchemy orm db-writer, and add to the ws-server:

async def process(self, websocket, path):
     raw_ = await websocket.recv()
     jdata = json.loads(raw_)
     orm_obj = orm_(jdata)
     try:
         self.dbwriter_.write(orm_obj)
         print(jdata, "write to db successfully")
     except Exception as e:
         dbwriter_.rollback()
         print(e)

     greeting = "hello from server"
     await websocket.send(greeting)
     print(f"> {greeting}")

 def run(self):
     if self.host and self.port :
         start_server = websockets.serve(self.process, self.host, self.port)
     else:
         start_server = websockets.serve(self.process, "localhost", 8867)
     asyncio.get_event_loop().run_until_complete(start_server)
     asyncio.get_event_loop().run_forever()

in summary

in reality, each ws-client is integrated to one upper application, which generate messages/log, and send to ws-server, inside which write to db, due to asyncio, the performance is good so far. in future, we maybe need some buffer at ws-server.

refer

a simple multi-client ws server

create a simple python sw server using Tornado

python: websockets

python: json

python: threadpool

design a logger from ws to db

Posted on 2020-03-09 |

background

for now, we had design a file server from db to s3 and mf reader to feed data to the ads test module, which run in SIMD mode, e.g. k8s. for a closed-loop SIL, another component is how to record test message/outputs to database.

I am thinking firstly event driven webframework python, or plusar, which is a popular python event-driven project. as lgsvl pythonAPI is really a good choice to transfer message among server and clients. so the basic idea is to integrate websocket/tcp channel inside aeb model

websocket vs tcp

most ADAS models developed in OEM are matlab based, and the most common communication protocol in Matlab is UDP, a little review about tcp vs udp, I personally don’t like udp, no reason. so narrow to TCP, the further question is: should it be a raw TCP or Websocket), websocket enables a stream of messages instead of a stream of bytes. WebSockets is built on normal TCP sockets and uses frame headers that contains the size of each frame and indicate which frames are part of a message. The WebSocket API re-assembles the TCP chunks of data into frames which are assembled into messages before invoking the message event handler once per message. WebSocket is basically an application protocol (with reference to the ISO/OSI network stack), message-oriented, which makes use of TCP as transport layer. btw, TCP is bidirectional because it can send data in both directions, and it is full-duplex because it can do that simultaneously, without requiring line turnarounds, at the API level

for ADAS test outputs, it’s plain txt, I’d prefer websocket. so make the decision.

matlab websocket client

there is an github project: matlab websocket, which is enough to implement a simple websocket in Matlab. discussion at stackoverflow

basically, here we bind a weboscket client to the adas model, in matlab. which is a good choice, so when runnign adas model in massively, the whole system looks like multi ws clients concurrently communicate with one ws server, which is well separated from adas model, and can freely implmented in either nodejs, python or c# e.t.c., totally friendly to other third-party data analysis tools in down-stream.

multi-write db concurrently

once we get the message from ws clients, we need transfer all of them to a database, e.g. pySQL. but the first thing need to understand is whether SQL can concurrent inserts. if there are multiple INSERT statements, they are queued and performed in sequence, concurrently with the SELECT statements. SQL doesn’t support parallel data inserts into the same table, with InnoDB or MyISAM storage engine.

this also answered another question, why not directly adas model write to db, since the parallel runinng cases increase, writing DB will be the bottleneck. so has the websocket middle level, is a good choice.

nodejs solution

nodejs is really powerful to implement the solution above. basically, a ws server and sql writer: mysqljs

python solution

similar as nodejs, python has matured websocket module and ORM moduel to write db.

refer

liaoxuefeng tcp

zhihu mysql lock

nodejs connect mysql

asam mdf reader

Posted on 2020-03-07 |

previously reviewed a few open source mdf readers as well as get basic concept about mdf. here is a working process with asammdf, which is step by step to understand this lib.

the basic problem here is to read a special signal data, not know its channel and group id.

first try

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
reader = MDF4(file_name)
data_dict = reader.info()
print("the total len of dict: ", len(data_dict))
print(data_dict.keys())
block_ = data_dict['group 117']
channel_metadata = reader.get_channel_metadata('t', 117)
channel117_comment = reader.get_channel_comment('t', 117)
channel117_name = reader.get_channel_name(117, 0)
pattern="channel 123"
for key in block_ :
r1 = re.search(pattern, str(block_[key]), re.M|re.I)
if r1:
print(key)
data_ = reader.get("channel 123") #doesn't work, as it reports can't find Channel
data_ = reader.get("fus_lidar_camera", "group 123") #doesn't work, as it reports can't find the Group
data_ = reader.get("fus_lidar_camera") #works

so first, “channel 123” is actually not the name. and with the right name can reading data, but no guarantee which group this channel is from, and further can’t know is there additional groups has record this type of data.

second try

as mentioned mdf4 groups have attributes, how about to access these attributes. so far, especially helpful attributes are:

  • groups:list, list of data group dicts

  • channels_db:dict, used for fast channel access by name. for each name key the value is a list of (group index, channel index) tuples

1
2
3
4
5
6
7
8
9
channels = reader.groups[2] #with the same order as found in mdf, even with index 0, 1, ...
print("type of channels: ", type(channels)) #dict
if "fus.ObjData_Qm_TC.FusObj.i17.alpPiYawAngle" in channels: print(channels["fus.ObjData_Qm_TC.FusObj.i17.alpPiYawAngle"].id) #none
print(" chanels type: " , type(channels["channels"])) #list
print(" chanels len: " , len(channels["channels"])) #577
print("chanel_group type: " , type(channels["channel_group"])) # ChannelGroup object
print("chanel_group len: " , len(channels["channel_group"])) #17
print(" data_group type: " , type(channels["data_group"])) #DataGroup object
print(" data_group len: " , len(channels["data_group"])) #10

which is far more less as expected, but give an good exploring about the reader’s attributes.

third try

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
reader = MDF4(file_name)
cdb = reader.channels_db
#print("size of c_db ", len(cdb))
#print("type of cdb ", type(cdb))
#for more sigles interested, please append to this list
sig_list = ['sensor_1', 'sensor_2', 'sensor_3']
for sig in sig_list:
if sig in cdb:
print(cdb[sig])
for item in cdb[sig]:
(gid, cid) = item
alp = reader.get(sig, gid, cid)
print("type of current sig ", type(alp))
print("size of current sig ", len(alp))
#alp is the raw sig data, here is only test print
else:
print(sig, " is not in cdb"

which gives me the right way to access the interested signals’ raw data. combing with s3 streaming body read, as mentioned in previous mdf-reader, which finally gives the powerful pipeline from reading mdf4 files from s3 storage to downside appilication

refer

pyton reg

python substring

design a nodejs web server from db to s3

Posted on 2020-03-05 |

background

during ADS road test, there are Tb~Pb mount of data generated, e.g. sensor raw data(rosbag, mf4), images, point cloud e.t.c. previous few blogs are focusing on data storage. for a more robost and user-friendly data/file server, also need consider database and UI.

from the end-user/engineers viewpoint, a few basic functions are required:

  • query certain features and view the data/files filtered
  • download a single/a batch of interested files (for further usage or analysis)
  • upload large mount of files quickly to storage (mostly by admin users)

a FTP server to support s3

traditionally, for many large size files to download, FTP is common used. the prons and corns comparing fttp and ftp for transfering files: HTTP is more responsive for request-response of small files, but FTP may be better for large files if tuned properly. but nowadays most prefer HTTP. doing search a little more, there are a lot discussions to connect amazon s3 to ftp server:

transfer files from s3 storage to ftp server

FTP server using s3 as storage

using S3 as storage for attachments in a web-mail system

FTP/SFTP access to amazon s3 bucket

and there are popular ftp clients which support s3, e.g. winSCP, cyberduck, of course, aws has it own sftp client, as well as aws s3 browser windows client), more client tools check here

however, ftp can’t do metadata query. for some cases, e.g. resimulation of all stored scenarios, which makes no difference for each scenario, we can grab one by one and send it to resmiluator; but for many other cases, we need a certain pattern of data, rather than reading the whole storage, then a sql filter is much efficient and helpful. so a simple FTP is not enough in these cases.

s3 objects/files to db

starting from a common bs framework, e.g. react-nodejs, and nodejs can talk to db as well.

** nodejs query buckets/object header info from s3 server, and update these metadata into db.

there is a great disscussion about storing images in db - yea or nay: when manage many TB of images/mdf files, storing file paths in db is the best solution:

  • db storage is more expensive than file system storge
  • you can super-acc file system access: e.g. os sendfile() system call to asynchronously send a file directly from fs to network interface, sql can’t
  • web server need no special coding to access images in fs
  • db win out where transactional integrity between image/file and its metadata are important, since it’s more complex to manage integrity between db metdata to fs data; and it’s difficult to guarantee data has been flushed to disk in the fs

so for this file server, the metadata include file-path-in-s3, and other user interested items.

1
2
3
4
5
file_id feature file_path
1 1 http://s3.aws.amazon.com/my-bucket/item1/img1a.jpg
2 2 http://s3.aws.amazon.com/my-bucket/item1/img1b.jpg
3 3 http://s3.aws.amazon.com/my-bucket/item2/img2a.jpg
4 4 http://s3.aws.amazon.com/my-bucket/item2/img2b.jpg

** during browser user query/list request, nodejs talk to db, which is a normal bs case.

** when the browser user want to download a certain file, then nodejs parse the file metadata and talk to s3

nodejs to s3

nodejs fs.readFile()

taking an example from official nodejs fs doc:

1
2
3
4
5
6
7
8
9
10
11
12
const fs = require('fs')
fs.readFileSync(pathname, function(err, data){
if(err){
res.statusCode = 500 ;
res.end(`Err getting file: $[err}`)
}else{
res.end(data);
}
});
const fileUrl = new URL('file://tmp/mdf')
fs.readFileSync(fileUrl);

if not file directly, maybe fs.Readstream class is another good choice to read s3 streaming object. fs.readFile() and fs.readFileSync() both read full content of the file in memory before returning the data. which means, the big files are going to have a major impact on your memory consumption adn speed of execution of the program. another choice is fs-readfile-promise.

express res.download

res object represent the HTTP response that an Express app sends when it gets an HTTP request. expressjs res.download

1
2
3
4
5
6
7
8
res.downlaod('/ads/basic.mf4', 'as_basic.mf4', function(err){
if(err){
log(`download file error: ${err}`)
}else{
log(`download file successfully`)
}
})

aws sdk for nodejs

taking an example from aws sdk for js

1
2
3
4
5
6
7
8
9
10
var aws = require('aws-sdk')
var s3 = new aws.S3()
s3.createBucket({Bucket: your_bucket_name, function(){
var params={Bucket: your_bucket_name, Key: your_key, Body: mf4.streaming};
s3.putObject(params, function(err, data){
if(err) console.log(err)
else console.log("upload data to s3 succesffully")
});
});

check aws sdk for nodejs api for more details.

in summary

either a FTP server or a nodejs server, it depends on the upper usage cases.

  • a single large-size(>100mb) file(e.g. mf4, rosbag) download, nodejs with db is ok, as db helps to filter out the file first, and a few miniutes download is needed

  • many of little-size(~1mb) files(e.g. image, json) downlaod, nodejs is strong without doubt.

  • many of large-size files download/upload, a friendly UI is not necessary, comparing to the performance, then FTP may be the solution.

1…345…20
David Z.J. Lee

David Z.J. Lee

what I don't know

193 posts
51 tags
GitHub LinkedIn
© 2020 David Z.J. Lee
Powered by Hexo
Theme - NexT.Muse