Thursday, March 17, 2016

Downloading WebRTC using Ubuntu

Hi,


This post explains the basic steps that one needs to follow for downloading WebRTC code base using Ubuntu.

My Ubuntu Virtual Box configurations are - 
Linux vm2 3.19.0-28-generic #30~14.04.1-Ubuntu SMP Tue Sep 1 09:32:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux



The main steps are.

  • mkdir webrtc-checkout  // Create a new folder
  • cd webrtc-checkout
  • sudo apt-get install git
  • git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git   // A folder named depot_tools will be created in your current directory
  • export PATH=/home/rohit/webrtc-checkout/depot_tools:"$PATH"  // Add depot_tools  to the PATH
  • sudo apt-get install g++ python libnss3-dev libasound2-dev libpulse-dev libjpeg62-dev libxv-dev libgtk2.0-dev libexpat1-dev  // Necessary packages
  • apt-get install openjdk-7-jdk  // It gets installed in /usr/lib/jvm/java-7-openjdk-amd64
  • export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
  • fetch --nohooks webrtc  // After this if you use 'ls -la' we can see .gclient_entries, .gclient and src (these are newly created)




  • gclient sync  // This will take several hours to complete as the total code size is in gigabytes

More details are available official links of WebRTC  
https://webrtc.org/native-code/development/

It is better not to use a Virtual Machine, as it may give several issues and in the end code downloading will get stopped.  So it is better to use a seperate linux PC.

Sunday, May 11, 2014

ARM Processors : In-order execution versus Out-of-order execution

Every program code gets converted into its corresponding machine language instructions. In-order and Out-of-order differs in the way these instructions gets executed by processor. The below examples are with respect to ARM processor.

Consider the below instructions (r0 ~ r6 are registers):
1) mov r0, #3    // Moves value 3 to register r0
2) mov r1, #5    // Moves value 5 to register r1
3) mov r2, #6    // Moves value 5 to register r2
4) add r4, r1, r2  // Adds contents of r1 and r2 and stores in r4
5) add r5, r0, r0  // Adds contents of r0 and r0 and stores in r5.  ie r5 = 2*r0
6) add r6, r1, r3  // Adds contents of r2 and r3 and stores in r6

Imagine that processor can execute two instructions in a cycle.

In-order execution :
Instructions will be executed as 
Cycle 1:  Instruction_1 + Instruction_2 (because both instructions does not have dependency with each other)

Cycle 2:  Instruction_3 only (Instruction 4 is not executed, because result of 4, depends on r2 which gets updated in Instruction_3 )

Cycle 3: Instruction_4 + Instruction_5 (because both instructions does not have dependency with each other)

Cycle 4: Instruction_6

Total cycles consumed is 4.

Out-of-order execution:
Instructions will be executed as 
Cycle 1:  Instruction_1 + Instruction_2 (because both instructions does not have dependency with each other)

Cycle 2:  Instruction_3 + Instruction_5 (because both instructions does not have dependency with each other)

Cycle 3:  Instruction_4 + Instruction_6 (because both instructions does not have dependency with each other)

Total cycles consumed is 3.


Summary

Wednesday, July 10, 2013

GPGPU

GPGPU stands for General Purpose computing using Graphical Processing Unit. Consider that our system has CPU (Central Processing Unit) and a GPU (Graphical Processing Unit). GPGPU is used for reducing the workload of CPU. The activities which can be done in parallel are transferred from CPU to GPU. Thus work pressure on CPU is decreased, and so speed of operation is increased.

Consider a matrix multiplication for 100 elements. So a normal C code which is executed on CPU will be 

void matrix_mult_cpu (int* a, int *b, int *c, int n)
{
int i=0;
for(i=0; i<n; i++)    //Let n=100
{
a[i] = b[i] * c[i];
}
}

This FOR loop will be executed 100 times. If 1 iteration of loop takes 1 ms (millisecond), then the entire loop takes 100 ms. Because the loop iterates for 100 times.

Matrix multiplication is an example of data parallelism. Here the operation to be performed (multiplication) is same for all elements. Only the data changes. If this code is transferred to GPU, it can handle all the 100 iterations within 1 cycle. Because GPU contains a collection of individual elements called work-items. Data for performing each iteration is taken, and then given to each work-item.
  • First work-item will receive the data b[0] and c[0], using which it will calculate a[0]. 
  • Second work-item will receive the data b[1] and c[1], using which it will calculate a[1] and 
  • finally 100th work-item will receive the data b[99] and c[99], using which it will calculate a[99]. 

The time for performing one multiplication is 1 ms. Since all the work-items operate at the same time in GPU, the result of 100 multiplications will be obtained in 1 ms itself. The same operation when performed on CPU took 100 ms !!!

In this manner when doing GPGPU programming, all the activities which can be done in parallel must be given to GPU and only serial tasks must be given to CPU.


Saturday, June 16, 2012

Linear quantization and Non Linear quantization


Linear quantization Here quantization step size is uniform. So signals with small amplitude and large amplitude are quantized with the same step size. So in linear method,  quantization error can be higher. Because quantizatied results will not be good for very small amplitude signals.



In the below figure uniform quantization  occurs at 8 different  input signal levels 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5 and 4.0
So here uniform quantization step size is 0.5.







Non Linear quantizationHere the quantization step size is different.  Eg) logarithmic quantization ( A-law speech codec  and U-law speech codec are good examples of logarithmic quantization).
When considering speech signals, majority of signals will be of lower amplitude only. So we need to concentrate more on these low amplitude signals. So signals with small amplitude are quantized with smaller quantization steps and signals with  large amplitude are quantized with larger step size. As a result quantization errors with lower amplitude signals will decrease, but errors with higher amplitude will increase. But in effect, the total quantization errors will decrease, because higher amplitude speech signals occur only very rarely. 


In the below figure non uniform quantization  occurs at 8 different  input signal levels 0.1, 0.3, 0.6, 1.0, 1,5, 2.2, 3.0 and 4.
So the non-uniform quantization step size is 0.1, 0.2, 0.3, 0.4. 0.5, 0.7, 0.8 and 1.0





Logarithamic quantization is an example of non-linear quantization. Here the quantaization step size will be non-uniform. Step size will be very small at start and then keeps increasing.

In audio codecs, non-linear quantization techniques are used. So for small amplitude signals they will have small step size and for large amplitude signals they will have larger step size. In doing so, the quantization errors which occurs for lower amplitude signals can be decreased. Non-linear quantization is selected because a small error/noise in lower amplitude signals is easily detected by ear.