Tech ... no more Tech ... or Tech

Thursday, March 17, 2016

Downloading WebRTC using Ubuntu

Hi,

This post explains the basic steps that one needs to follow for downloading WebRTC code base using Ubuntu.

My Ubuntu Virtual Box configurations are -
Linux vm2 3.19.0-28-generic #30~14.04.1-Ubuntu SMP Tue Sep 1 09:32:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The main steps are.

mkdir webrtc-checkout // Create a new folder
cd webrtc-checkout
sudo apt-get install git
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git // A folder named depot_tools will be created in your current directory
export PATH=/home/rohit/webrtc-checkout/depot_tools:"$PATH" // Add depot_tools to the PATH
sudo apt-get install g++ python libnss3-dev libasound2-dev libpulse-dev libjpeg62-dev libxv-dev libgtk2.0-dev libexpat1-dev // Necessary packages
apt-get install openjdk-7-jdk // It gets installed in /usr/lib/jvm/java-7-openjdk-amd64
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
fetch --nohooks webrtc // After this if you use 'ls -la' we can see .gclient_entries, .gclient and src (these are newly created)

gclient sync // This will take several hours to complete as the total code size is in gigabytes

More details are available official links of WebRTC

https://webrtc.org/native-code/development/prerequisite-sw/

https://webrtc.org/native-code/development/

It is better not to use a Virtual Machine, as it may give several issues and in the end code downloading will get stopped. So it is better to use a seperate linux PC.

Sunday, May 11, 2014

ARM Processors : In-order execution versus Out-of-order execution

Every program code gets converted into its corresponding machine language instructions. In-order and Out-of-order differs in the way these instructions gets executed by processor. The below examples are with respect to ARM processor.

Consider the below instructions (r0 ~ r6 are registers):

1) mov r0, #3 // Moves value 3 to register r0

2) mov r1, #5 // Moves value 5 to register r1

3) mov r2, #6 // Moves value 5 to register r2

4) add r4, r1, r2 // Adds contents of r1 and r2 and stores in r4

5) add r5, r0, r0 // Adds contents of r0 and r0 and stores in r5. ie r5 = 2*r0

6) add r6, r1, r3 // Adds contents of r2 and r3 and stores in r6

Imagine that processor can execute two instructions in a cycle.

In-order execution :

Instructions will be executed as

Cycle 1: Instruction_1 + Instruction_2 (because both instructions does not have dependency with each other)

Cycle 2: Instruction_3 only (Instruction 4 is not executed, because result of 4, depends on r2 which gets updated in Instruction_3 )

Cycle 3: Instruction_4 + Instruction_5 (because both instructions does not have dependency with each other)

Cycle 4: Instruction_6

Total cycles consumed is 4.

Out-of-order execution:

Instructions will be executed as
Cycle 1: Instruction_1 + Instruction_2 (because both instructions does not have dependency with each other)

Cycle 2: Instruction_3 + Instruction_5 (because both instructions does not have dependency with each other)

Cycle 3: Instruction_4 + Instruction_6 (because both instructions does not have dependency with each other)

Total cycles consumed is 3.

Summary

Wednesday, July 10, 2013

GPGPU

GPGPU stands for General Purpose computing using Graphical Processing Unit. Consider that our system has CPU (Central Processing Unit) and a GPU (Graphical Processing Unit). GPGPU is used for reducing the workload of CPU. The activities which can be done in parallel are transferred from CPU to GPU. Thus work pressure on CPU is decreased, and so speed of operation is increased.

Consider a matrix multiplication for 100 elements. So a normal C code which is executed on CPU will be

void matrix_mult_cpu (int* a, int *b, int *c, int n)

{

int i=0;

for(i=0; i<n; i++) //Let n=100

{

a[i] = b[i] * c[i];

}

This FOR loop will be executed 100 times. If 1 iteration of loop takes 1 ms (millisecond), then the entire loop takes 100 ms. Because the loop iterates for 100 times.

Matrix multiplication is an example of data parallelism. Here the operation to be performed (multiplication) is same for all elements. Only the data changes. If this code is transferred to GPU, it can handle all the 100 iterations within 1 cycle. Because GPU contains a collection of individual elements called work-items. Data for performing each iteration is taken, and then given to each work-item.

First work-item will receive the data b[0] and c[0], using which it will calculate a[0].
Second work-item will receive the data b[1] and c[1], using which it will calculate a[1] and
finally 100th work-item will receive the data b[99] and c[99], using which it will calculate a[99].

The time for performing one multiplication is 1 ms. Since all the work-items operate at the same time in GPU, the result of 100 multiplications will be obtained in 1 ms itself. The same operation when performed on CPU took 100 ms !!!

In this manner when doing GPGPU programming, all the activities which can be done in parallel must be given to GPU and only serial tasks must be given to CPU.

Saturday, June 16, 2012

Linear quantization and Non Linear quantization

Linear quantization : Here quantization step size is uniform. So signals with small amplitude and large amplitude are quantized with the same step size. So in linear method, quantization error can be higher. Because quantizatied results will not be good for very small amplitude signals.

In the below figure uniform quantization occurs at 8 different input signal levels 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5 and 4.0

So here uniform quantization step size is 0.5.

Non Linear quantization : Here the quantization step size is different. Eg) logarithmic quantization ( A-law speech codec and U-law speech codec are good examples of logarithmic quantization).

When considering speech signals, majority of signals will be of lower amplitude only. So we need to concentrate more on these low amplitude signals. So signals with small amplitude are quantized with smaller quantization steps and signals with large amplitude are quantized with larger step size. As a result quantization errors with lower amplitude signals will decrease, but errors with higher amplitude will increase. But in effect, the total quantization errors will decrease, because higher amplitude speech signals occur only very rarely.

In the below figure non uniform quantization occurs at 8 different input signal levels 0.1, 0.3, 0.6, 1.0, 1,5, 2.2, 3.0 and 4.

So the non-uniform quantization step size is 0.1, 0.2, 0.3, 0.4. 0.5, 0.7, 0.8 and 1.0

Logarithamic quantization is an example of non-linear quantization. Here the quantaization step size will be non-uniform. Step size will be very small at start and then keeps increasing.

In audio codecs, non-linear quantization techniques are used. So for small amplitude signals they will have small step size and for large amplitude signals they will have larger step size. In doing so, the quantization errors which occurs for lower amplitude signals can be decreased. Non-linear quantization is selected because a small error/noise in lower amplitude signals is easily detected by ear.