OpenCV读书笔记

2020-03-20

致谢

感谢沈同学在学习过程中的指导
感谢陆赫冉对英文语法的校正和修改工作

OpenCV 自学源码和部分笔记

所有OpenCV-Python Tutorials内的教程

所有章节均使用Python完成实现，并且发布代码至opencv-turtorial-notes

其中几个比较核心知识点的笔记，在代码备注中存在一份，单独整理出来如下

OpenCV 思维导图

OpenCV

後記

故事还要从2015年5月说起，那一年的夏天，我硕士生毕业了。毕业答辩上我的带有一行行错别字的作文，被老师评了80多分。

说实话这个分并不算差，能得这么高分的原因，大概也是因为毕业作文的题目和在座的诸位一样，读起来那是相当的朗朗上口，平仄有律，就一点不太好，不太容易看懂。

当初大概除了我，还有另外一个人能看懂，很可惜，这个人并不是我导师，而是万能的主。

时光荏苒，白驹过隙，五年过去之后，回过头来，应该只有主能看懂了。

每当梦境进行到这一幅画面时，我内心的想法仿佛是这个电影的画面音一样，一点点增大增大再增大，直到音量扭曲了整个胶卷和画面，只剩下卓别林式的一行话

“因为老子也不知道我论文里写的啥JB玩意啊。。。。。”

工作中每每看到这些曾经伴随我校园时光的名词术语时，总会有一丝丝的遗憾

为了不愧对于我的“通信与信息系统——工学硕士”这个张薄纸，我只好 ~~加入了9.9学Python 走上职场巅峰~~ 一点点啃官方教程

在啃教程的时间里，终于有一个人叩门而入，并向我喊道：

“~~全世界的无产阶级，联合起来~~ Огонь по готовности!”

在共同打守望先锋打友谊下，达瓦西里·沈教授了我过去遗失的技能。

终于，在2020年疫情还未散去的春天，樱花飘满了目黑川的东京，我终于知道我硕士毕业论文写的是什么了！

展开全文 >>

OpenCV Machine Learning Note(III) K-Means

2020-03-11

Q1: What is Clustering?

find k labels of a set of data and divide data to k by labels
like T-shirt size example in document

Q2: Why do we need clustering?

it gives a label to the raw data

considering in knn, svm training data stage, we need some responses to tell us what is data, how can we get responses?

assign data one by one manually
some algorithm categorizes data set automatically

clustering is the second choice

Q3: Why does k-means sound so familiar?

in video chapter, we use mean-shift track motive object track.actually, mean-shift and k-means clustering have some similar points

they all calculate the mean value and repeat
they all need criteria to stop repeat
they all shit the center in every repeat

Q4: What’s the main problem of k-means clustering?

they are two problems of k-means clustering

how to choose the category number k
how to choose the initial centroids of each category

Q5: Do we have other clustering algorithms?

yes, something like

Mean-Shift Clustering
Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)
Agglomerative Hierarchical Clustering

展开全文 >>

OpenCV Machine Learning Note(II) SVM

2020-03-10

Q1: What is SVM(Support Vector Machines)?

from the document

1	"So what SVM does is to find a straight line (or hyperplane) with largest minimum distance to the training samples."

SVM just like KNN, it is also a “classification algorithms”

Q2: What problem does SVM want to solve?

save the memory!!!

like document refer( according to the document,)

"In kNN, for a test data, we used to measure its distance to all the training samples and take the one with minimum distance. It takes plenty of time to measure all the distances and plenty of memory to store all the training-samples......., should we need that much?"

Q3: What concepts in SVM?

Decision Boundary:

the imaging line boundary that can separate samples on plane
Linear Separable/Non-Linearly Separable:

“Linear Separable” means if all samples locate on a plane, it can be separated by line but if the dimensional of sample data is not 2D, how can we separate it by line?

this situation called “Non-Linearly Separable”
Support Vectors:

the problem svm want to solve is knn need large memory can save all samples distance, svm only need the samples near the “Decision Boundary”, the samples take part in calculating “Decision Boundary” is “Support Vectors” which means “support to calculate”
Support Planes:

the imaging lines which plus positive/negative offset with “Decision Boundary”, or the lines passing through “Support Vectors”

it can improve the classify result accuracy by beyond the planes.

Weight Vector/Feature Vector/Bias:

“Decision Boundary” is a line, we can present it as
1
ax+by+c = 0
or more professional
1
w1x1 + w2x2 + b = 0 => w^Tx + b = 0
which
1
2
3
4
5
w^T = [w1, w2]
x = [x1, x2]
w is "Weight Vector"
x is "Feature Vector"
b is "Bias"
if sample data dimension not 2D, the length of w,x can be n
1
w = [w1, w2 ... wn], x = [x1, x2 ... xn]
C:

a constant value by samples distribution or experience
just like the k of KNN, magic number in most of the time

ξ:

the error value of misclassification data, from document:

"It is the distance from its corresponding training sample to their correct decision region.

For those who are not misclassified .... their distance is zero."

it means if a sample is correctly classified, the

  1. classified: ξ = 0
  2. misclassified: ξ = distance to "Support Planes"

Gamma:

the parameter γ of a kernel function during decrease dimension to 2D

Q4: How to deal with “Non-Linearly Separable” ?

for the data not 2-dimensional which can’t be divided into two with a straight line.

we can just map it to 2D model(!!), so we can separate it by line.

d < 2(one dimension):

map it with added dimension, like (x) => (x, x^2)

d > 2(three or higher dimension):

decrease the higher dimension to 2-dimension via “kernel function”, like the document example

attention:
!! the document write wrong here, lose pow symbols in line (*)

2d point:
    p=(p1,p2), q=(q1,q2).

3d point:
    ϕ(p) = (p21, p22, 2sqrt(p1p2) ),
    ϕ(q)=(q21, q22, 2sqrt(p1p2) )

define a "kernel function" K(p,q)

which does a dot product between two 3d points:

K(p,q) = ϕ(p).ϕ(q)
       = (p21, p22, 2sqrt(p1p2) ).(q21, q22, 2sqrt(p1p2) )
       = (p1q1)^2 + (p2q2)^2 + 2p1q1p2q2  *
       = (p1q1+p2q2)^2
       = (pq)^2

It means,a dot product in three-dimensional space can be achieved using squared dot to product in two-dimensional space.

Q5: What’s the main problem of SVM?

How to pick the C value

from document

"How ( In which way?)should the parameter C be chosen? It is obvious that the answer to this question depends on how the training data is distributed. (Obviously, the ....) Although there is no general answer."

formula:

min L(w,b0) = ||w||^2 + C * ∑(ξ)
Large values of C:
1. less misclassification errors but a smaller margin.
2. in this case it is expensive to make misclassification errors.
3. since the aim of the optimization is to minimize the argument, few misclassifications errors are allowed.
Small values of C:
1. bigger margin and more classification errors.
2. in this case the minimization does not consider that much the term of the sum.
3. so it focuses more on finding a hyperplane with big margin.

展开全文 >>

OpenCV Machine Learning Note(I) kNN

2020-03-09

Q1: What is kNN(k-Nearest Neighbor)?

from the document

“kNN is one of the simplest classification algorithms available for supervised learning.”

Q2: What problem does KNN want to solve?

“classification algorithms” is the key point

take one unclassified input data in to some predefined classes is a classification

Q3: What concepts in KNN?

train data:

because kNN need predefined pairs during the algorithms,
the data making pairs is training data
responses:

responses are training data classified results, responses size should be equal to train data
test data:

a list of input data, each input data should be classified as one result of responses
labels:

the result of test data, same as responses, but with a different name
distance:

the number for judging neighbors, k nearest denote k minimums

Q4: What’s the main problem with kNN?

from document

1	"But there is a problem with that. Red Triangle may be the nearest. But what if there are a lot of Blue Squares near to him? "

if the number of red/blue are equal, the location of red/blue distribute maybe not, is it import?

how to choose the import k?
we are supposing all k neighbors are with equal importance? Is it justice?

展开全文 >>

OpenCV Camera Note(III) Epipolar Geometry

2020-03-08

Q1: What problem does epipolar geometry want to solve

use more than one camera to find the depth information loosen by taking an image using a pin-hole camera

Q2: How to understand epipolar means

the prefix “epi” from Greek epi ‘upon, near to, in addition’. which denote the concept “space around”

“polar” from the word “pole”

“pole” means “a long, slender, rounded piece of wood or metal, typically used with one end placed in the ground as a support for something”

so the “polar” in geometry field is explained in “the straight line joining the two points at which tangents from a fixed point touch a conic section.”

the final explanation of “epipolar geometry” is the geometry use the thin line(polar) to find the space information(epi-)

Q3: What concepts in this algorithm

EPILINE:

document refers
1
"The projection of the different points on OX form a line on the right plane (line l′)."
the point is the projection on the right image, it is pixels on image coordinate epiline corresponding to the point x on the left image it can be described as “epipolar constraint”

EPIPOLAR CONSTRAINT:

from the document

1	"It means, to find the point x (correspond pixel location) on the right image,search along this epiline. It should be somewhere on this line"

the document adds

1	"Think of it this way, to find the matching point in other images, you need not to search the whole image, just search along the epiline.So it provides better performance and accuracy"

EPIPOLAR PLANE:

all points will have its corresponding epilines in the other image. look at the image on the document to understand
EPIPOLE:

right camera projection pixel location on left image called “epipole”

Q4: How to find epipolar lines and epipoles above?

to find them, we need two more ingredients, Fundamental Matrix (F) and Essential Matrix (E).

Q5: The difference between Fundamental & Essential

Essential Matrix :

contains information about translation and rotation, which describes the location of the second camera, relative to the first in global coordinates.
Fundamental Matrix :

contains the same information as Essential Matrix in addition to the information about the intricacies of both cameras, so that we can relate the two cameras in pixel coordinates.

(If we are using rectified images and normalize the point by dividing by the focal lengths, F=E).

because Essential is a subset of Fundamental, we can say Fundamental Matrix F maps a point in one image to a line (epiline) in the other image.

展开全文 >>

OpenCV Camera Note(II) Pose Estimation

2020-03-07

Q1: Why do we need to pose estimation?

from document referring to “how the object is situated in space, like how it is rotated”, what means to know how an object place in 3D space, but render on 2D plane image, the core idea is converting a 3D point to a 2D pixel point it is a project from 3D coordinate to 2D coordinate

Q2: How to estimate?

convert the problem to “where the camera position in 3D space, if shot the image(chessboard) vertical in Z, and parallel in XY plane” then from the document said “we can assume Z=0, such that, the problem now becomes how the camera is placed in space to see our pattern image.”

Q3: How does it work?

we just prepare some vertex
find rotate & transform vectors from camera matrix & distort coefficients
then project vertex to pixel use vectors
draw those vertexes

展开全文 >>

OpenCV Camera Note(I) Calibrate

2020-03-06

Q1: Why need to calibrate?

Because photo will be distorted after camera shot, the reason is light refract

Q2: What kinds of distortion exist?

Two major kinds of distortion are RADIAL DISTORTION(径向畸变) and TANGENTIAL DISTORTION(切向畸变).

Q3: Which distortion will effect image

Radial distortion causes straight lines to appear curved.
Tangential distortion causes “some areas in the image may look nearer than expected.”

Q4: What caused distortion?

The actual reason is pinhole cameras theory design, the physical reason is light refraction

radial distortion occurs because light has different lengths to pinhole via different refraction
tangential distortion occurs because the image-taking lens is not aligned perfectly parallel to the imaging plane.

Q5: What is needed for correct distortion?

Need intrinsic and extrinsic parameters

Intrinsic parameters are specific to a camera.
Extrinsic parameters correspond to rotation and translation vectors

which translates a coordinates of a 3D point to a coordinate system.

Q6: Intrinsic parameters contain?

They include information like focal length (fx,fy) and optical centers (cx,cy).

展开全文 >>

OpenCV Feature Note(V) ORB

2020-03-05

Paper:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.370.4395&rep=rep1&type=pdf

ORB

ORB algorithm full name is “Oriented FAST and Rotated BRIEF”

which means

ORB use detector like FAST to find potential corners, the document said “basically a fusion of FAST”
ORB use descriptors like BRIEf to describe keypoints, the document said “rBRIEF”
FAST algorithm isn’t “rotate invariant”, ORB add orientation to it
BRIEF descriptor isn’t as good as original after rotate, ORB use greedy search to solve it

Q1: How to assign FAST algorithm an orientation?

use “intensity centroid” point and “center” point as the vector direction

use image moments(图像矩) calculate the “intensity centroid”, refer to 03-contours-feature

In order to get rotate invariant

ORB must rotate BRIEF to domain orientation(called steer in the document)

but BRIEF will lose accurate after rotating.

Q2: BRIEF is rotation sensitivity, why?

BRIEF has a large variance and a mean near 0.5

if it rotated to domain orientation, the points in the image left & right corner.

that relative to domain orientation will look similar, then the variance will decrease

Q3: How to promote the rotated(steered) BRIEF result?

in paper use leaning method base on PASCAL 2006 set data

Run each test against all training patches.
Order the tests by their distance from a mean of 0.5, forming the vector T.
Greedy search

then the document said

1	"a greedy search among all possible binary tests to find the ones that have both high variance and means close to 0.5"

get the final result rBRIEF

Q4: How about the scale-invariant of ORB?

from the paper it talks a little about this

1	"FAST does not produce multi-scale features. We employ a scale pyramid of the image, and produce FAST features(filtered by Harris) at each level in the pyramid."

but no detail of this chapter

IN CONCLUSION:

ORB algorithm add orientation to FAST, use Harris filtered in pyramids and solve the rotation sensitivity of BRIEF

展开全文 >>

OpenCV Feature Note(IV) BRIEF

2020-03-04

Paper:

http://www.gwylab.com/download/BRIEF_2010.pdf

Trait

BRIEF algorithm is a descriptor algorithm, not like SIFT/SURF

BRIEF does not care about how to find potential corners
BRIEF only contains a descriptor design
BRIEF can be combined with FAST keypoints result

BRIEF essential is SOLVING THE MEMORY PROBLEM

like the document said SIFT/SURF ..

1	"Creating such a vector for thousands of features takes a lot of memory which are not feasible for resource-constraint applications especially for embedded systems."

How

How to save the memory ?

Replace calculate float number by compare with binary stream Hamming Distance.
Not generate too complicated descriptor in memory

method1 is also useful in SIFT/SURF process, BUT …

1	"we need to find the descriptors first, then only can we apply hashing, which doesn't solve our initial problem on memory."

method2 actually solve the memory problem

there are 3 main steps refer from official document

selects a set of nd (x,y) location pairs around keypoint in a unique way
compre the point p,q pixel intensity in each pair, an record the result with 0/1
take the result binary stream(binary string) as a descriptor

about the “around”

in paper it define as S x S area
in openCV which default value S = 31

about the “unique way” detail can be explained in paper, in summary

mean value sample
p,q obey same gauss distribution sample
p,q obey different gauss distribution sample
use polar coordinate system
p is fix to (0,0) then q around to p

Orientation Sensitivity:

From paper we can know

1	"BRIEF is not designed to be rotationally invariant....it tolerates small amounts of rotation"

from experience if rotate angle should less than 30 degree

IN CONCLUSION:

BRIEF descriptor not use the data as a descriptor, using the result calculate by data as descriptor

展开全文 >>

OpenCV Feature Note(III) FAST

2020-03-03

Paper:

http://dev.ipol.im/~reyotero/bib/bib_all/2008_Roster_Porter_Drummond_faster_better_corner_detection.pdf

Differ

FAST algorithm not like SIFT/SURF in two points

FAST is focusing on looking at potential corners, fast enough in real-time
FAST does not contain a descriptor design like SIFT

FAST algorithm has two versions in 2006 and 2010

2006: it just gives a proposal on how to search a corner point
2010: it improves the accuracy with machine learning

HERE WE ILLUSTRATE SOME IMPORT SENTENCE IN THE ARTICLE

SENTENCE 1:

From the article, we know the essential definition:

"The original detector classifies p as a corner, 

if there exists a set of n contiguous pixels in the circle 

which are all brighter than the intensity of the candidate pixel Ip 

plus a threshold t, or all darker than Ip − t"

comment:

n: n = 12 in this case

Ip: Intensity of Pixel(Point) what means a gray value

t: threshold, filter is [0, Ip - t) & (Ip + t, 255]

SENTENCE 2:

Now the problem convert to “how to detect the contiguous pixels fast”, answer is detecting with diagonal points of the 16 pixels

the article said:

"The high-speed test examines pixels 1 and 9.

If both of these are within t if Ip, then p can not be a corner.

If p can still be a corner, pixels 5 and 13 are examined.

If p is a corner then, at least three of these must all be brighter than Ip + t or darker than Ip − t.

If neither of these is the case, then p cannot be a corner."

BUT THERE ARE SOME WEAKNESSES

if more details, please read the article & document

not good when n != 12 (article)
not optimal (article)
data waste (document)
too close (article)

HOW TO SOLVE THESE WEAKNESSES

1,2,3: use machine learning ID3 algorithm to create a decision tree
4: use the nonmax suppression to refine the result

ID3 algorithm:

train from some images contain keypoint
create a table of point1, intensity, a new bool value indict if it is a keypoint
use the formula Hg = H(P) − H(Pd) − H(Ps) − H(Pb) calculate the gain
once the “decision tree” is created, can be used in FAST

nonmax suppression:

sum the area intensity around every p, then keep the max in a distance

展开全文 >>

致谢

OpenCV 自学源码和部分笔记

Feature 特征点

Camera 相机

Machine Learning 机器学习

OpenCV 思维导图

後記

Q1: What is Clustering?

Q2: Why do we need clustering?

Q3: Why does k-means sound so familiar?

Q4: What’s the main problem of k-means clustering?

Q5: Do we have other clustering algorithms?

Q1: What is SVM(Support Vector Machines)?

Q2: What problem does SVM want to solve?

Q3: What concepts in SVM?

Q4: How to deal with “Non-Linearly Separable” ?

Q5: What’s the main problem of SVM?

Q1: What is kNN(k-Nearest Neighbor)?

Q2: What problem does KNN want to solve?

Q3: What concepts in KNN?

Q4: What’s the main problem with kNN?

Q1: What problem does epipolar geometry want to solve

Q2: How to understand epipolar means

Q3: What concepts in this algorithm

Q4: How to find epipolar lines and epipoles above?

Q5: The difference between Fundamental & Essential

Q1: Why do we need to pose estimation?

Q2: How to estimate?

Q3: How does it work?

Q1: Why need to calibrate?

Q2: What kinds of distortion exist?

Q3: Which distortion will effect image

Q4: What caused distortion?

Q5: What is needed for correct distortion?

Q6: Intrinsic parameters contain?

Paper:

ORB

Q1: How to assign FAST algorithm an orientation?

Q2: BRIEF is rotation sensitivity, why?

Q3: How to promote the rotated(steered) BRIEF result?

Q4: How about the scale-invariant of ORB?

IN CONCLUSION:

Paper:

Trait

How

Orientation Sensitivity:

IN CONCLUSION:

Paper:

Differ

HERE WE ILLUSTRATE SOME IMPORT SENTENCE IN THE ARTICLE

SENTENCE 1:

SENTENCE 2: