to test newly developed algorithms, due to the massive amount of simulation data, need a distributed simulation platform based on Spark distributed framework.
simulation based on synthetic data, used in control and planning
simulation based on real data playback, used to test function and performance of different components
in autonomous driving system, each functional module in ROS is deployed as a node, the communication between nodes rely on the messages with well-defined formats. so the test of each module is independent, we can develop simulation module for each functional module.
anatomy of simulator
there should be a dynamic model of the car, a vehicle dynamic model; then the external environment is needed, which includes static and dynamic scenes.
the simulator can decompose external environment into basic elements, and rearranges the combination to generate a variety of test cases, each simulating a specific scenario.
e.g. the position, speed, next step command of the barrier vehicle can give different basic elements.
ROS based simulator
to use the real traffic data to reproduce the real scene requires a distributed simulation platform.
ROSBAG, record from Topic and replay message to Topic.
the Record function is to create a recording node in ROS, and call the subscribe method to receive ROS message to all Topics, and then write the message to Bag file.
the Play function is to establish a play node, and call the advertise method to send message in bag to specified Topic.
Spark distributed platform
the Spark driver launch different simulation applications, e.g. localization algorithms, object recoginization algorithms, vehicle decision-making and control algorithms etc, then allocate resources to each Spark worker, who first reads the RosBag data into memory and launches a ROS node to process the incoming data.
the interface between Spark and ROS is through Linux pipe, basically data written to the write end of the pipe is buffered by the kernel until it is read from the read end of the pipe.
two problems: 1) Spark only support text-based data consuming; 2) Spark memeory to ROBag
Binary data streaming
the core Spark data structure is resilient distributed dataset (RDD). to process and transform binary data into a user-defined format and transform the output of Spark computation into a byte stream, even further to a generic binary file(HDFs)
1) encode and serialize the binary files(image, lidar input data) to form a binary byte stream
2) de-serialize and decode the binary stream, according to interpret byte stream into an understandable format and perform target computation
3) the output then be encoded and serialized before passed in RDD partitions(e.g. HDFs), and returned to Spark driver
data retrieval through ROSbag cache
two things: reading from memory through ROSbag play, and writing to memory through ROSbag record.
solution: design a memoryChunkedFile class, derived from ChuckedFile class, to read/write memory rather than files.