MIT 6.S094 Deep Learning for Self Driving Cars

Posted on 2018-08-10 |

1 Deep Reinforcement Learning link

apps:  motion planning

2 Convolutional Neural Networks link

apps: End-2-end driving task(pedestrian detect)

3 Recurrent Neural Networks link

apps: steering control through time

CNN Project: DeepTesla

DRL Project: DeepTraffic

Framework: ConvNetJS

C++ template and STL containers

Posted on 2018-08-10 |

I used C++ couple years already, but never take a close look at it. when looking back to few C++ demo works at school, I had a strong feeling at that time, to make the code running is my only goal, how it work or how to improve it, is always ignored, I am afraid in the language details, as the wisdom say: the evil is in details.

after working for three years, I feel the necessary and urgent to back to the language itself(e.g. Linux, C/C++) as the first step to move forward in application development. Frameworks, engineering-based APIs are more close to final products, making them easy to be attracted, compared to how the details implemented. like the mechanical undergradute students, who first-time run ABAQUS with beatiful visulized results, feels so good.

anyway I have to delay the short satification or self-feeling-good. C is clean and the applications have clear structure, C++ is more metaphysics, I even don’t know where to start. even I thought I knew C++ well, but actualy there are many details behind, e.g. allocator in STL.

template

template is used for generic programming, e.g. both vector and vector smell same at compiler time. Template reels off or abstract the common part “vector” as a container, and whatever type is ok to store in.

function template


template <class T>
	void func(T arg)
template <typename T>
	void func(T arg)
template <class T>
	T add(T n1, int v){};
template <class T1, class T2, class T3>
	T3 func(T1 a1, T2 a2, int arg3=3){};

the actual meaning of TYPE is deduced by compiler depending on the arg passed to this function. “class” or “typename” is similar.

template supports default parameters, once the i-th argument is set with default value, all arguments after it must using default values.

class template


template<class _Tp, class _Ref, class _Ptr>
class _list_iterator {
	typedef _List_iterator<_Tp, _Tp&, _Tp*>  iterator ;
	typedef _List_iterator<_Tp, const _Tp&, const _Tp*>  const_iterator ;
	typedef _List_iterator<_Tp, _Ref, _Ptr>  _Self ;
	typedef _Tp value_type ;
	typedef _Ptr  pointer;
	typedef _Ref  reference ;
	typedef _List_node<_Tp>  _Node; 
}

feel the power of template in STL source code.

STL containers

references:
<< the annotated STL source using SGI STL >> by jjHou
a visitor guide to C++ allocator
the annotated STL source
I took several days to start, cause the first section on allocator already blocked me.

why allocator ?

The logic of a dynamic container doesn’t depend on the specifies of how memory is obtained; namely, the design of container classes should be decoupled from the memory allocation policy.

how allocator works ?

memory allocation and object construction is separated, allocator has four operations: allocate(), deallocate(), construct(), destroy(). there are std interface to allocators

AllocTraits = std::allocator_traits<alloc> 
// define an allocator of type "alloc" 
AllocTraits::pointer  p = AllocTraits::allocate(a, n)  
//get memory space for n objects of type T, but not construct yet 
AllocTraits::construct(a, trueaddress(ptr), x, y, z) ;    //construct a T(x, y, z)
AllocTraits::destroy(a, trueaddress(ptr));  // ~T()
AllocTraits::deallocate(a, ptr, 1) ;   // deallocate the space for one T object

WHEN we say A is an allocator for T , where T is a type e.g. AllocatTraits::value_type. we mean, A knows how to obtain and release memory to store objects of type T. git:allocator

iterator

iterator is used to build algorithms in containers. while I don’t really get the traits

template <class _Iterator>
struct iterator_traits {
	typedef typename _Iterator::iterator_category  iterator_category ;
	typedef typename _Iterator::value_type	 value_type ;
	typedef typename _Iterator::difference_type  difference_type ;
	typedef typename _Iterator::pointer	pointer;
	typedef typename _Iterator::reference reference ;
};
template<class _Tp>
struct iterator_traits<_Tp*> {
	typedef typename random_access_iterator_tag  iterator_category ;
	typedef typename _Tp	 value_type ;
	typedef typename ptrdiff_t  difference_type ;
	typedef typename _Tp*	pointer;
	typedef typename _Tp& eference ;
};

vector

std::vector is dynamic, continous.

template <class _Tp,  class _Alloc = __STL_DEFAULT_ALLOCATOR(_Tp)>
class vector:: protected _Vector_base<_Tp,  _Alloc>
{
	public:
		typedef  _Tp value_type ;
		typedef  value_type*  pointer ;
		typedef  const value_type*  const_pointer ;
		typedef value_type* iterator ;
		typedef const value_type*  const_iterator ;
		iterator begin() {return _M_start;};
		iterator end() {return _M_finish ;};
	protected:
		_Tp* _M_start ; // the starting point of current used space 
		_Tp* _M_finish; // the ending point of current sued space 
	      _Tp*  _M_end_of_storage; // the end of total avialable space(capacity)
	public:
		explicit vector(const allocator_type& __a = allocator_type())
			: _Base(__a){}  //default constructor 
		// construt of n elements of initial-value 
		vector(size_type __n, const _Tp& __value, const allocator_type& __a = allocator_type())
			: _Base(__n, __a)
		{
			_M_finish = uninitialized_fill_n(_M_start, __n, __value);
		}
		//copy construtor
		vector(const vector<_Tp, _Alloc>& __x)
			:_Base(__x.size(),  __x.get_allocator())
		{
			_M_finish = uninitialized_copy(__x.begin(),  __x.end(), _M_start);
		}
		size_type  size() const {return size_type(end() - begin());}
		size_type capacity() const {return size_type(_M_end_of_storage - begin()); 
		bool empty() const{return begin() == end();}
		reference operator[](size_type __n){return *(begin() + __n);}
		void reserve(size_type __n){
			if (capacity() < __n)
			{
				const size_type __old_size = size(); 
				iterator __tmp = _M_allocate_and_copy(__n, _M_start, _M_finish);
				destroy(_M_start, _M_finish);
				_M_deallocate(_M_start, _M_end_of_storage-_M_start); 
				_M_start = __tmp;
				_M_finish = __tmp + __old_size ;
				_M_end_of_storage = _M_start + __n ;
			}
		}
		void push_back(const _Tp& __x)
		{
			if(_M_finish != _M_end_of_storage)
			{
				construct(_M_finish, __x);
				++_M_finish;
			}else
				_M_insert_aux(end(), __x);
		}
		template<class _Tp,  class _Alloc>
		inline bool operator==(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
		{
			return __x.size() == __y.size() && 
			equal(__x.begin(), __x.end(), __y.begin());
		}
};

list

std::list is cycling double-direction link list.


struct _List_node_base 
{
	_List_node_base * _M_next ;
	_List_node_base * _M_prev ;
};
template <class _Tp>
struct _List_node : public _List_node_base 
{
	_Tp _M_data;
};
struct _List_iterator_base 
{
	typedef size_t   size_type ;
	typedef ptrdiff_t   difference_type ;
	typedef bidirectional_iterator_tag  iterator_category; 
	_List_node_base*  _M_node ;
	//constructor 
	_List_iterator_base(_List_node_base* __x) : _M_node(__x){}
	_List_iterator_base(){};
	void _M_incr() { _M_node = _M_node->_M_next; };
	void _M_decr() { _M_node = _M_node->_M_prev; };
	bool operator==(const _List_iterator_base& __x) const
	{
		return _M_node == __x.M_node ;
	}
};
template<class _Tp,  class _Ref,  class _Ptr>
struct _List_iterator : public _List_iterator_base 
{
	typedef _List_iterator<_Tp, _Tp&, _Tp*>  iterator ;
	typedef _List_iterator<_Tp, const _Tp&, const _Tp*>  const_iterator ;
	typedef _List_iterator<_Tp, _Ref, _Ptr>  _Self ;
	typedef _Tp value_type ;
	typedef _Ptr  pointer;
	typedef _Ref  reference ;
	typedef _List_node<_Tp>  _Node; 
	//constructor 
	_List_iterator(_Node* __x): _List_iterator_base(__x){}
	_List_iterator(){}
	reference operator*() const {return ( (_Node*) _M_node)->_M_data; }
	_Self&  operator++()
	{
		this->_M_incr();
		return *this;
	}
};
template <class _Tp,  class _Alloc=_STL_DEFAULT_ALLOCATOR(_Tp)>
class list : protected _List_base<_Tp, _Alloc>
{
	public:
		typedef _List_node<_Tp> _Node;
	protected:
		_List_node<_Tp> * _M_node; 
	public:
		list(size_type __n,   const _Tp&  __value,
			const allocator_type& __a = allocator_type())
			: Base_(__a)
		{
			insert(begin(), __n, __value);
		}
		explicit list(size_type __n)
			:_Base(allocator_type())
		{
			insert(begin(), __n, _Tp());
		}
	protected:
		_Node* _M_create_node(const _Tp& __x)
		{
			_Node* __p = _M_get_node(); 
			__STL_TRY
			{
				_Construct(&_p->_M_data, __x);
			}
			return __p;
		}
	public:
		iterator begin() { return (_Node*)(_M_node->_M_next);}
		iterator end() { return _M_node; }
		bool empty() const {return _M_node->_M_next == _M_node ;} 
		size_type size() const {
			size_type  __result = 0;
			distance( begin(),  end(), __result);
			return __result;
		}
		size_type max_size() const {return size_type(-1);}
		reference front() {return *begin();}
		reference back() {return *(--end());}
		void swap(list<_Tp, _Alloc>& __x)
		{
			__STD::swap(_M_node, __x._M_node);
		}
		interator insert(iterator __position,  const _Tp& __x)
		{
			_Node* __tmp = _M_create_node(__x);
			__tmp->_M_next = __position._M_node ;
			__tmp->_M_prev = __position._M_node->_M_prev ;
			
			__position._M_node->_M_prev->_M_next = __tmp;
			__position._M_node->_M_prev  = __tmp;
			
			return __tmp;
		}
		void push_front(const _Tp& __x){insert(begin(), __x);}
		void push_back(const _Tp& __x){insert(end(), __x);}
		iterator erase(iterator __position)
		{
			_List_node_base* __next_node = __position._M_node->_M_next ;
			_List_node_base* __prev_node = __position._M_node->_M_prev ;
			_Node* __n = (_Node*) __position._M_node ;
			__prev_node->next = __next_node ;
			__next_node->prev = __prev_node;
	
			_Destroy(&__n->_M_data);
			_M_put_data(__n);
			return iterator((_Node*) __next_node);
		}
}

dequeue

std::dequeue can operate elements at both ends, and the memory is multi-sectional, in each memory section is linear continous, with advantage of vector and list.

template <class _Tp,  class _Ref,  class _Ptr>
struct _Deque_iterator {
	typedef _Deque_iterator<_Tp, _Tp&, _Tp*>   iterator ;
	typedef _Deque_iterator<_Tp, const _Tp&, const  _Tp*>  const_iterator ;
	static size_t _S_buffer_size() {return __deque_buf_size(sizeof(_Tp));}
	typedef random_access_iterator_tag  iterator_category;
	typedef _Tp value_type ;
	typedef _Ptr  pointer;
	typedef _Ref reference;
	typedef size_t  size_type ;
	typedef ptrdiff_t  difference_type ;
	typedef _Tp** _Map_pointer ;
	typedef _Deque_iterator _Self; 
	_Tp* _M_cur ; //current point in cache
	_Tp* _M_first; //first point in cache
	_Tp* _M_last; 
	_Map_pointer _M_node ; 
	/* deque support multi-section memory space, in each memory section is linear continous
	 * Map index is used to point to the memory sections
	 */
	
	_Deque_iterator(_Tp* __x, _Map_pointer __y)
		:_M_cur(__x), 
		 _M_first(*__y),
		 _M_last(*__y + _S_buffer_size()), 
		 _M_node(__y)
	{}
	//default 
	//copy
	
	difference_type operator-(const _Self& __x) const
	{
		return difference_type(_S_buffer_size()) *  (_M_node - __x.M_node - 1) + 
			(_M_cur - _M_first) + (__x._M_last - __x._M_cur);
	}
	_Self& operator++()
	{
		++_M_cur ; 
		if(_M_cur == _M_last){
			_M_set_node(_M_node + 1);
			_M_cur = _M_first;
		}
		return *this;
	}
	_Self& operator--()
	{
		if(_M_cur == _M_first){
			_M_set_node(_M_node - 1);
			_M_cur = _M_last;
		}
		--_M_cur ;
		return *this;
	}
}
	
/*  deque has two iterator:
   start -> the first element in first cache space;  
   finish -> the last element in last cache space */
template <class _Tp,  class _Alloc>
class _Deque_base {
	protected:		
		_Tp** _M_map ;
		size_t _M_map_size; 
		iterator _M_start;
		iterator _M_finish;
};
template <class _Tp,  class _Alloc = __STL_DEFAULT_ALLOCATOR(_Tp)>
class deque : protected _Deque_base<_Tp, _Alloc>
{
	public:
		typedef typename _Base::iterator  iterator ;
		typedef typename _Base::const_iterator  const_iterator ;
	protected:
		using _Base::_M_map ; 
		using _Base::_M_map_size ;
		using _Base::_M_start ; 
		using _Base::_M_finish ;
	public:
		explicit deque(const allocator_type& __a = allocator_type())
			:_Base(__a, 0)
		{}
		deque(const deque& __x):
			_Base(__x.get_allocator(), __x.size())
		{
			uninitialized_copy(__x.begin(),  __x.end(), _M_start);
		}
		~deque(){ destroy(_M_start, _M_finish);}
		void push_back(const value_type& __t)
		{
			if(_M_finish._M_cur != _M_finish._M_last - 1)
			{
				construct(_M_finish._M_cur, __t);
				++_M_finish._M_cur ;
			}else
				_M_push_back_aux(__t);
		}
		void push_front(const value_type&  __t)
		{
			if(_M_start._M_cur != _M_start._M_first)
			{
				construct(_M_start._M_cur - 1,  __t);
				--_M_start._M_cur;
			}else
				_M_push_front_aux(__t);
		}
		void pop_back()
		{
			if(_M_finish._M_cur != _M_finish._M_first)
			{
				--_M_finish._M_cur;
				destroy(_M_finish._M_cur);
			}else
				_M_pop_back_aux();
		}
		void pop_front()
		{
			if(_M_start._M_cur != _M_start._M_last -1)
			{
				destroy(_M_start._M_cur);
				++_M_start._M_cur;
			}else
				_M_pop_back_aux();
		}
		iterator insert(iterator position, const value_type&  __x)
		{
			if(position._M_cur == _M_start._M_cur)
			{
				push_front(__x);
				return _M_start;
			}else if(position._M_cur == _M_finish._M_cur)
			{
				push_back(__x);
				iterator __tmp = _M_finish;
				--__tmp;
				return __tmp;
			}
		}
};

stack

std::stack only operate on the top element (first in last out), can’t iterate the container, the default container for stack is deque.

temlate <class _Tp, class _Sequence>
class stack
{
	public:
		typedef typname _Sequence::value_type  value_type ;
		typedef typname _Sequence::size_type   size_type ;
		typedef typname _Sequence	  container_type ;
		typedef typname _Sequence::reference  reference ;
		typedef typname _Sequence::const_reference const_reference;
	protected:
		_Sequence c;  // the fundmental container:  deque by default
	public:
		stack() : c() {}
		explicit stack(const _Sequence& __s) : c(__s) {}
		
		bool empty() const { return c.empty();}
		size_type size() const {return c.size();}
		reference top() {return c.back(); }
		void push(const value_type& __x){ c.push_back(__x);}
		void pop() { c.pop_back();}
	
		template<class _Tp,  class _Seq>
			bool operator==(const stack<_Tp, _Seq>& __x, const stack<_Tp, _Seq>& __y)
			{
				return __x.c == __y.c ;
			}
};

queue

std::queue supports only pop element from front, and push element into end, the default container is dequeue

template <class _Tp, class _Sequence>
class queue
{
	public:
		typedef typname _Sequence::value_type  value_type ;
		typedef typname _Sequence::size_type   size_type ;
		typedef typname _Sequence	  container_type ;
		typedef typname _Sequence::reference  reference ;
		typedef typname _Sequence::const_reference const_reference;
	protected:
		_Sequence c;  // the fundmental container:  deque by default
	public:
		queue() : c() {}
		explicit queue(const _Sequence& __s) : c(__s) {}
		bool empty() const { return c.empty();}
		size_type size() const {return c.size();}
		reference top() {return c.back(); }
		void push(const value_type& __x){ c.push_back(__x);}
		void pop() { c.pop_front();}
	
		template<class _Tp,  class _Seq>
			bool operator==(const queue<_Tp, _Seq>& __x, const queue<_Tp, _Seq>& __y)
			{
				return __x.c == __y.c ;
			}
};

priority queue

std::priority_queue is queue with priority.

design a threadpool in pure C

Posted on 2018-08-02 |

actually this post should be named as “pure C multithreads”, while it’s better to write some real code. a simple threadpool in pure C

exposed APIs

at most simple case, what a threadpool object should do?

thpool_init(num_threads);
thpool_destory(threapool_obj);
thpool_add_work(callback, args); 
// so user define event callback() can be passed in;

threadpool is used to assign jobs (from job list) to a thread. so joblist should be an internal class; and it’s better to package thread, job as separate class too; to add work, what kind of funcs need for joblist ? at least, to add new job(from outside) to the joblist.

internal structures:

struct thread {
 	int id;
 	pthread_t thread;   //pthread_t is a unsigned long int; it's system assigned
 	threadpool *thpool;  //so thread know its domain
}
struct job {
	(void*) func(void*); // each job/task should have a callback func
	(void*) func_arg ; //and func args
	//since consider joblist as a list, it's helpful to add pointers to neighbors
	job* prev;
	job* next;
}
struct jobqueue {
    int num_jobs;
    job* front;
    job* rear;
    //some flags to support add new node(job) to the list
}    
   	 
struct threadpool_ {
    thread* threads_array;
    jobqueue job_queue;
    // some flags
 }

consider syncronlization

will multi threads call add_task() simutaneously? if so, jobqueue should have a mutex object;

  struct jobqueue {
    int num_jobs;
    job* front;
    job* rear;
    pthread_mutex_t  jq_mutex;
}

during threadpool initialization, will all threads be created simultaneously and immediately? if not, there should be thread status flags (creating, exisiting-but-idle, working, died); and to update these flags need protect by mutex object;

struct threadpool_ {
    thread* threads_array;
    jobqueue job_queue;
    int num_threads; //stands for the number of all existing threads
    pthread_mutex_t tp_mutex;
 }

this is an interesting, when design a lib, what’s in my mind. hopefully implement by the end of week.

pure C socket

Posted on 2018-07-31 |

this is a review from GNU C lib. socket is a two-way communication channel, both read and write can be performed at either end.

sys/socket.h

int socket(AF_INET, SOCK_STREAM, protocol); 
/* return fd for this new socket, or -1 as error */
int close(int socket_fd); 
/* return 0 when close successfuly, else return -1 */
int connect(int socket, struct sockaddr *addr, socketlen_t len);
/* initiate a connection from (client) socket to (server) address; by default, the connection is blocking untill server respond */
int listen(int socket, int n);
/* n specifies the length of queue for pending connections; when the queue fills up, new clients attempting to connect will fail */
int accept(int socket, struct sockaddr *addr,  socketlen_t *len_ptr);
/* if successfully, accept return a new socket fd, the original (server) socket remains open and unconnected;  the address and len_ptr are used to return information about the name of client socket that initiated this connection */
size_t send(int socket, const void *buffer, size_t size, int flags);
/* param socket is the fd of current sending socket. no receiver socket explicitly defined, since connection-oriented(TCP)protocol will connect first prior to send/receive  */
size_t recv(int socket, const void *buffer,  size_t size, int flags);
/* param socket is the fd of current receiving socket */
``` 
## network socket 
I didn't realize GNU C socket I/O is so related to network socket. the missing part when reading muduo, libuv is here. 
The server listens the connection requests on the special server socket, then accept each incoming connection. select() blocks/sleeps the program until input is available/ready on the special server socket/fd. 
```c
void FD_ZERO(fd_set *set) 
/* initialized the file descriptor set to empty set */
void FD_SET(int fd, fd_set *set) /* add fd to set */
void FD_CLR(int fd,  fd_set *set) /*remove fd from set */
void FD_ISSET(int fd, const fd_set * set)  
/* return true(non-zero) if fd is in  set; else return 0 */
STDIN_FILENO ; /*  file descriptor for std input “1” */

how to debug server/client code? chatRoom

pure C I/O

Posted on 2018-07-31 |

this is a review from GNU C programming

stream

two basic mechanism for representing the connection between your program and the file: streams & file descriptors. FD is represented as objects of type int; streams are represented as FILE * objects

a stream is an abstract concept reprensenting a communication channel to a file, a device, or consider it as a sequence of characters with functions to take characters out of one end, and put characters into the other end, like a pipe.

file position

an integer representing the number of bytes from the beginning of the file; each time a character read or written, the file position is incremented, namely, access to a file is sequential.


long int ftell(FILE \* stream); //return file position 
int fseek(FILE \*stream, long int offset, int whence);
/\* whence is SEEK\_SET |  SEEK\_CUR | SEEK\_END \*/
``` 
“write to a file” is always appended sequentially to the end of the file, regardless of the file position; but “read from a file” is used the current file position. which means multiple reading can happens simultaneously with an independent file pointer. In fact, each opening creates an independent file position pointer, even in the same program to open a file twice.  
## std streams
```c
	FILE *stdin ;
	FILE *stdout;
	FILE *stderr;
	FILE *fopen(const char *filename, const char *openMode);  
	/* create a new stream connected to the file */
	int fclose(FILE *stream)
	/* disconnected between the stream and the file, any buffered output is written and any buffered input will discarded */

ASCII IO


int fputs(const char *s, FILE *stream) ;
/* write a char array into stream. this call doesn't add a newline or terminal null character */
char *fgets(char *s, int n, FILE *stream);
/* read n number of char array from stream, default add a newline character */
ssize_t getline(char **lineptr, size_t *n, FILE *stream);
/* read a whole line from stream */

block IO

usually block stands for either block data or text in fixed-size, instead of characters or lines

size_t fread(void *data, size_t size, size_t n, FILE *stream);
/* read #n block, each block has size from stream, and store in data */
size_t fwrite(const void *data, size_t size, size_t n, FILE *stream);
/* write #n block, each block has size from buffer data to stream */

formatted IO


int printf(const char *template, …);  //write to std output
int fprintf(FILE *stream, const char *template, ...); write to stream
int sprintf(char *s, const char *template, ...); //write to a string
int scanf(const char *template, …) //formatted read from stdin
int fscanf(FILE *stream, const char *template, …) // read from stream 
int sscanf(const char *s, const char *template, …) // read from a string

EOF

int feof(FILE* stream) ; 
/* return nonzero iff end of file indicator is set for stream */
int ferror(FILE* stream) ; 
/* return nonzero iff error indicator is set for stream */

stream buffer

stream and file is not communicated character-by-character, there is a buffer for I/O.

1 2	int fflush(FILE stream) / take any buffered output on stream to be delivered to the file */

file descriptor

int open(const char *filename, int flags);
int open64(const char *filename,  int flags) ;
// allow large file mode 
size_t read(int fd,  void *buffer, size_t size) 
//  read from fd  size bytes into buffer 
size_t pread(int fd, void *buffer,  size_t size, off_t  offset) 
// start reading from position “offset” 
size_t  write(int fd, const void *buffer,  size_t size) 
off_t  lseek(int fd,  off_t offset,  int whence)  
// change the file position of fd 
FILE *fdopen(int fd, const char *opentype) 
// return a new stream for this fd

synchronizing I/O

void sync(void) ;
// to ensure all operations finished before they return 
int fsync(int fd) ;
// to ensure all data associated with the fd is written to the device

not yet

async I/O, event I/O, interrupt driven I/O, GPIO …

Linux at work

Posted on 2018-07-26 |

bash environment

/etc/profile ->  hold shell environment & startup settings 
~/.bashrc  -> system wide definitions for shell funcs & alias 
~/.bash_profile -> user environment individually

ususally during coapplication configuration on Mac or Linux, mostly use ~/.bashrc to add project related environment variables.

regular expressions

the$ :matches the line ending with “the”
^the :matches the line starting with “the”
[^g]abc : search “abc” but not starting with “g”
[^abc…z][0-9] : search numbers but not with alphabet in front
^[a-z] : search the line starting with low alphabet
^[^a-z]: search the line starting without low alphabet
^$ : match white space
[xyz]: general match anyone from “xyz”
[!xyz]: general match in opposite, neither from “xyz”

shell expressions

command & : to run command in bg
$! : current PID
$# : number of arguments
$1 : the first paramter
$* : wildcard for all variables

Linux commands

ldd: check runtime lib
unix2dos / dos2unix: change file format between Windows and Linux
ps : print running processes in current terminal
ipcs : check share memory portion in current system
ipcrm -M ipc_key : remove the shared memory portion id by ipc_key
top: print virtual,resident,shared memory percentage
basename: e.g. basename /path/to/file -> file
wc -l .file: print total lines of the file
df: check the usage of disk
find . -type f -print0 | xargs -0 grep -l “xx”
find D:\ | grep xml
less .file : read the last few lines of file
ln item link : soft link
ln -s item link : hard link
type command : check the type of command
cat f1 f2 > merged
sort
uniq
comm f1 f2 : the common part of f1 and f2
diff f1 f2 : the difference of f1 and f2
2 >& 1: redirect std output(fd=2) to std error(fd=1)
nm : check libs used in current executable(T defined, U undefined)

gdb

MPI gdb: (each MPI thread will have an independent terminal window)
mpirun -np #cpus xterm -e gdb ./exe

set breakpoint in different src:
b sth.cxx:20

print an array:
p (int[length]* a)

add input file:
set args -j input_file

load src after triggering GDB:
gdb
file exe
set args -j input_file

books about Linux

bash guide for beginners
tao of regular expression
Linux programmer’s manual
Linux system administrators guide
C expert programming
what every programmer should know about CPU caches

resources in multi-core optimization

understand CPU utilization & optimization
Intel: optimization applications for NUMA
Intel guide for developing multithreaded applications
MPI parallel programming in Python
considerations in software design for multi-core, multiprocessor architectures
how to optimize GEMM
optimize for Intel AVX using MKL with DGEMM
GEMM: from pure C to SSE optimized micro kernel
a practical guide to SSE SIMDD with C++
multi-core design
Unix and pthread programming

resources in applications

introduction to post processing finite element results with AVS
CAE Linux: FEA inter-operability
open sourcing a Python project the right way
PPSS
pyNastran
hdfviewer
valgrind

postscript

starting from 2016, I had went through each hot topic nowadays, even did some study in DL, AV, CV etc. every time the passion burst out and I promised to e.g. study a framework, or contribute an open project. in reality, the passion dies away soon. It’s like a new and very attracting concept bump out in the market, but no business model can handle it, then it dies out.

downside of this learning pattern is that the fundmental is ignored. e.g. I can’t success in code interview, few experience in basic algorithms. kind of person always vision the big, but don’t realize how to reach there. it’s kind of wasting finally, just want to be focus at this moment.

pure C structure

Posted on 2018-07-26 |

definition

typedef struct gid_ {
			double x;
			double y;
			char* name ;
		} gid ;

typedef defines “gid” as an alias name to “struct gid_ “. typedef also alias function handle/pointer, which is often used in asynchronous/event callback programming. other data encapsulation are:
enum : map a list of const names to integral constants
union: store many data type in one memory address

initialization

1
2
3

gid g1={2.0, 3.0, "g1"};
gid g2={.x=2.0, .y=3.0, .name="g2"};
gid g3=(gid){2.0, 3.0, "g3"};

in C++, structure initialization can also be done in constructor.

memory alignment

for better memory access in CPU architecture, memory alignment in structure is considered. namely:

chars can start on any byte address
2-bytes shorts must start on an even address 
4-bytes ints/floats must start on an address divisible by 4
8-bytes doubles/longs must start on an address divisible by 8

the size of the whole structure, is aligned to intergeral times of the max size of its member variable. e.g. sizeof(gid) = 24, not 8+8+4.

to put the member variables in ascending/descending order is good practice.

structure pointer arithmetic

“gid++” will step forward the sizeof(gid); structure also supports self-reference:

typedef struct gid_ {
		char *name ;
		double x;
		double y;
		gid_ *next_gid ; 
	} gid ;

another common utils is structure array:

1 2	gid gid_list[MAX_SIZE]; gid **gid_list_p ;

structure as function parameters

in general, structure can be passing to function by value or by pointer, but not by reference in pure C. also structure as return value from function can be value or a pointer

structure in C++

in C++, structure supports member functions, is same as a public class. and the initialization can be done either in constructor function or direct initialization during definition. see the difference of struct between C and C++

stdlib.h


 /* convert string s1 to an int*/	
	int atoi(const char* s1);
 
 /* memory op */
	void* malloc(size_t size); 
	void free(void *ptr);
	
 /* system level op */
	char* getenv(const char *name);
	int system(const char *command); 
/* algorithms */
	void* bsearch(const void* key, const void* base,
	size_t nitems, size_t size,  int(*compar)(const void*, const void*))  
	void qsort(void *base, size_t nitems, size_t size, 
int(*compar)(const void*, const void*))

pure C string

Posted on 2018-07-24 |

string initialization

in pure C, string is a char array terminating with NULL(‘\0’). To initialize an array of char(string) is with a string literal(a double-quotaed string).

in general, the string literal is used as the initializer for an array of char; anywhere else, it is a static array of chars, which shouldn’t be modified by a pointer.

/* initialize a string in C 
 * the size of s1 is taken from the size of the initializer
 */
	char s1[] = "hello world" ; 
// or 
	char s1[12] = "hello world"; 
/* common errors */
	char s2[];
	s2 = "hello world"
// error: storage size of 's2' isn't known 
	char s2[12];
	s2 = "hello world"
// error:  invalid array assignment 
/* there is no "=" operator defined for char array in C 
   using strcpy()
*/
``` 
## char pointer arithmetic 
```c
	char s1[12];
	*s1++ ;	
/* error:  lvalue requied as increment operand
 * s3 is an array, can't modify the address of an array. the difference between pointer and array 
*/ 
	 char* s2 = s1; 
	 *s2++;
/* s2 is a pointer to s1, ++ is ok */

string.h

/* concatenate two strings */
strcat(char* s1, const char* s2) ;
strncat(char* s1, const char* s2, size_t n); 
/*  locate the first/last occurance of char c in string s */
strchr(const char* s1, int c);
memchr(const void* s1, int c, size_t n);
strrchr(const char* s1, int c); 
/*  compare s1 and s2 alphabetically */
strcmp(const char* s1, const char* s2);	
strncmp(const char* s1, const char* s2, size_t n);
memcmp(const void* s1, const void* s2, size_t n);
/*  copy s2 to s1 */
strcpy(char *s1, const char *s2); 
memcpy(void* s1, void* s2, size_t n);
/* number of bytes, not including terminating NULL; sizeof() will inlcude the terminating NULL */
strlen(const char* s1); 
/* find the first occurance of substring s2 in s1 */
strstr(const char *s1,  const char *s2);
/* split string s1 into a sequence of tokens by s2 */
strtok(char* s1, const char* s2);

\ is replaced by in C++.
and be aware of downs of C string.

which function is called

Posted on 2018-07-18 |

which function is called

when a derived object calls the base class member function, inside which calls another virtual member function, which is implemented inside the derived class, so which virtual member function is actually called ?

#include <iostream>
class Weld {
	public: 
		Weld(){};
		virtual ~Weld(){};
		virtual bool getDataIndex(){
			std::cout << "Weld::getDataIndex" << std::endl;
			return true ; 
		}
		bool getInternalDataList(){	
			return getDataIndex();
		}
};
class SpotWeld: public Weld{
	public:
		SpotWeld(){};
		virtual ~SpotWeld(){};
		virtual bool getDataIndex(){
			std::cout << "SpotWeld::getDataIndex" << std::endl;
			return true; 
		}
};
class ACM2 : public SpotWeld{
	public:
		ACM2(){};
		~ACM2(){};
		virtual bool getDataIndex(){
			std::cout << "ACM2::getDataIndex" << std::endl;
			return true;
		}
		bool getPoints(){
			return getInternalDataList();
		}
};
int main(){
	ACM2  acm2 ;
	acm2.getPoints();
	return 0;
}

output is:

ACM2::getDataIndex

so the derived object always calls the member function that most closest to its own class domain first, even if this member function is called from a base class member function.

multi-core at work

Posted on 2018-07-13 |

NUMA

uniform memory arch, all CPUs in the same sokcet shall the memory, and the memory IO is bottleneck; non uniform memory acess(NUMA), each CPU has itw own memory, inter-CPU memory access is remote.

numctl  cpu_node_bind
MPI binding APIs

memory allocation strategy in NUMA

interleave: place the memory on alternating node,
the first memory portion in node0, the next 
portion in node1
membind: forcely to allocate memory in special node

CPU affinity

the benifit of CPU affinity is to reduce context switch since one special process is binded to one speical CPU, so the data required in this process (no other process data will be switched in), can always in the CPU cache. espcially for NUMA arch to access data locally. and this is specially true when the application generate a large cache footprint, e.g. in scientific computing.

for general applications, CPU afinity may reduce performance, since in this way the CPU scheduler can’t work properly.

CPU info

/proc/cpuinfo

physical id: socket index
siblings: number of cores in each socket
core id: current core index

e.g. Ford HPC CPU architecture:

2 or 4 CPU sockets group into one computing node
each socket has 10 or 12 CPU cores
each socket has a on-board shared memory

atomic operation

CPU physical mechanism:

physically there is a bus #hlock pin. 
if #lock is added before the assembly instruction, 
the corresponding machine code will be pulled down
during the execution of the #hlock pin till the
end, basically the bus is locked only for current 
instruction

cache coherence:

the cache unit tranfered between CPU and main memory is cache line. in one socket, as slibings share L3 cache, there is an scenario, when CPU1 modified one variable, but not yet writen to main memory, and CPU2 read from main memory and did modified again, then the variable in CPU1 and CPU2 is not coherence.

volatile in C, forcely to read the variable value 
from main memory every time, to avoid use dirty
memory in cache

another scenario, to achieve and cache coherence, and the same variable is read and write repeatly by multiple processes, the performance is worse, “false sharing”

lock:

signal, also called "sleeping lock": used when the
lock need to keep for a longer time 
spin lock: at most hold by one thread, to block
other threads into critial area, used to keep 
for a short time.
write/read lock:

system performance

resident memory(res), the portion of virtual memory space that is actually in RAM; swapped memory, when the physical memory is full and the system needs more memory, inactive pages in memory moved to the shared space, and swapped usable memory in

virtual memory = res memory +  swapped memory

mmap, to access the remote (data block) like access local RAM.

voluntary context switches(vcs):

when a thread makes a system call that blocks. vcs
measures the frequency of calling blocked system I/O

involuntary context switches(ivcs):

when a thread has being runing too long without
making a system call that blocks, and there are
other processes waiting for CPU, then OS will
switch for other CPUs. ivcs measures the CPU
competition, an unfinished processed is switched off

in general, as more threads, the context switch cost increase, due to the total amount of switch increase and as well each switch is more expensive, since CPU cache is limited, and each process will hold fewer data in cache.

cpu_time (clock_t):
the total amount of time that a process has actually used

user CPU time:
the amount of time spend in user space running

system CPU time:
the amount of time spent during kernel space running

wall-clock time:
the whole time from the process start to end

Linux IPC

inter-process communication(IPC) share memory is used in our application to cowork with MPI. while IPC helps to balance memory distributed in multi computing nodes, and MPI threads are the working horse to eat shared data. there are other IPC libs, e.g. Boost.interprocess.

ftok() -> to generate a IPC key, based on a special file path

shmget() -> generate a shared memory portion and return a shm_id

shmat() -> attach to the shm_id and return a pointer to that shared memory

shmctl() 

shmdt()

in design, all threads call shmget() to get a pointer to the share memory section, but actually only master thread do create the portion, others read it. since all thread can access the share memory section, it’s important to keep the first returned pointer from master thread clean/unpolluated.

MPI

there are a few basic MPI APIs.

in our project, to benefit both CPU and memory performance, we actually need: 1) subgroup the MPI comm, 2) bind MPI threads to sockets.

MPI_COMM_GROUP()
MPI_Group_incl() 
MPI_COMM_create()

apis and numctl during run-time:

mpirun -bycore -bind-to-socket -> bind process to each core on the socket 

mpirun -bysocket  -bind-to-socket 

mpirun numctl -membind=0 -np 32 ./run

David Z.J. Lee

what I don't know

GitHub LinkedIn