TCS3200 color sensor with k-Nearest Neighbor classification algorithm

markusb · December 23, 2014, 12:59am

The k-NN (k-nearest neighbors) algorithm is among the simplest of all machine learning algorithms. We will use this algorithm to classify the color of a new object. As our training set of learned colors will only contain one sampe per color, k = 1, and the object is simply assigned to the class of that single nearest neighbor.

We will work in a 3-dimensional Euclidean space

. The x-axis represents the ratio of the clear channel frequency to the red channel frequency, the y-axis the ratio of the clear channel frequency to the green channel frequency and the z-axis the ratio of the clear channel frequency to the blue channel frequency (see TCS3200 data sheet for more details).

The image below illustrates the 3-dimensional Euclidean space, the trained colors (white, orange, yellow, red, green and blue point) and a new sample (black point), which color needs to be classified. The black lines are the so called Euclidean distances from the new sample point to every trained color point. All we need to do is to find the smallest distance.

In general, for an n-dimensional Euclidean space

, the Euclidean distance between two points

and

is given by

Beside many other applications this color machine learning algorithm could be used for automated test strip reading:

Image credit http://www.terumo.com

Hardware Required:

Arduino board or other micro controller
TCS3200 breakout board
Hook-up wire

Circuit:

Not as shown I have used a TCS3200 breakout board with a collimator lens.

Example:

byte LED = 2, S0 = 3, S1 = 4, S2 = 5, S3 = 6, OUT = 7;
char* color[] = {"white", "orange", "yellow", "red", "green", "blue", "out of range"};
char point[] = "Point color sensor to ";
char object[] = " object";
char submit[] = "Submit 'ok' if it's done";
int learned_colors[3][6];
int PercentageRed, PercentageGreen, PercentageBlue;
float OutOfRange;
void setup() {

  Serial.begin(9600);

  TCS3200_Setup();

}
void loop() {

  learning_mode();

  while(1) {

    new_sample();

    classify();

  }

}
void TCS3200_Setup() {

  pinMode(S0,OUTPUT);

  pinMode(S1,OUTPUT);

  pinMode(S2,OUTPUT);

  pinMode(S3,OUTPUT);

  pinMode(LED,OUTPUT);

  pinMode(OUT,INPUT);

}
void TCS3200_On() {

  digitalWrite(LED,HIGH); // switch LED on

  digitalWrite(S0,HIGH); // output frequency scaling (100%)

  digitalWrite(S1,HIGH);

  delay(5);

}
void TCS3200_Off() {

  digitalWrite(LED,LOW); // switch LED off

  digitalWrite(S0,LOW); // power off sensor

  digitalWrite(S1,LOW);

}
void NoFilter() {

  digitalWrite(S2,HIGH); // select no filter

  digitalWrite(S3,LOW);

  delay(5);

}
void RedFilter() {

  digitalWrite(S2,LOW); // select red filter

  digitalWrite(S3,LOW);

  delay(5);

}
void GreenFilter() {

  digitalWrite(S2,HIGH); // select green filter

  digitalWrite(S3,HIGH);

  delay(5);

}
void BlueFilter() {

  digitalWrite(S2,LOW); // select blue filter

  digitalWrite(S3,HIGH);

  delay(5);

}
void input() {

  String inputBuffer = “”;

  while (inputBuffer != “ok”) {

    inputBuffer = “”;

    while (Serial.available() > 0) {

      char ch = Serial.read();

      inputBuffer += ch; 

      delay (10);

    }

  }

}
void getColor() {

  float FrequencyClear, FrequencyRed, FrequencyGreen, FrequencyBlue;

  TCS3200_On();

  NoFilter();

  FrequencyClear = float(pulseIn(OUT,LOW,10000));

  RedFilter();

  FrequencyRed = float(pulseIn(OUT,LOW,10000));

  GreenFilter();

  FrequencyGreen = float(pulseIn(OUT,LOW,10000));

  BlueFilter();

  FrequencyBlue = float(pulseIn(OUT,LOW,10000));

  TCS3200_Off();

  PercentageRed = int((FrequencyClear / FrequencyRed) * 100.0);

  PercentageGreen = int((FrequencyClear / FrequencyGreen) * 100.0);

  PercentageBlue = int((FrequencyClear / FrequencyBlue) * 100.0);

  OutOfRange = 500.0 / FrequencyClear;

}
void learning_mode() {

  for(byte i = 0; i < 6; i ++) {

    Serial.print(point);

    if(i == 1) Serial.print("an ");

    else Serial.print(“a “);

    Serial.print(color[i]);

    Serial.println(object);

    Serial.println(submit);

    Serial.println(””);

    input();

    getColor();

    learned_colors[0][i] = PercentageRed;

    learned_colors[1][i] = PercentageGreen;

    learned_colors[2][i] = PercentageBlue;

  }

}
void new_sample() {

  Serial.print(point);

  Serial.print(“a new”);

  Serial.println(object);

  Serial.println(submit);

  Serial.println("");

  input();

  getColor();

}
void classify() {

  int i_color;

  int ClosestColor;

  float MaxDiff;

  float MinDiff = 100.0;

  if(OutOfRange < 1.5) ClosestColor = 6; // object out of range

  else {

    // find nearest neighbor, k = 1

    for (i_color = 0; i_color < 6; i_color ++) {

      // compute Euclidean distances

      float ED = sqrt(pow((learned_colors[0][i_color] - PercentageRed),2.0) +

      pow((learned_colors[1][i_color] - PercentageGreen),2.0) + pow((learned_colors[2][i_color] - PercentageBlue),2.0));

      MaxDiff = ED;

      // find minimum distance 

      if (MaxDiff < MinDiff) {

        MinDiff = MaxDiff;

        ClosestColor = i_color;

      }

    }

  }

  Serial.print(“Object is “); // print result

  Serial.println(color[ClosestColor]);

  Serial.println(””);

}

https://www.youtube.com/watch?v=cDvhVbhBprc

nschreiber0813 · December 23, 2014, 10:19am

Well done…

Hey markus well done. Although I am really not in a need to buy a color sensor what do you hope to use it for. I would love to hear that.

markusb · December 23, 2014, 10:43am

Test strip reading, sorting

Test strip reading, sorting objects by colors, sensing ambient light, color matching, machine vision…

ggallant · December 23, 2014, 10:46am

Ambient Light

Can you elaborate on the effects of ambient light?

nschreiber0813 · December 23, 2014, 11:30am

Any specific project you are

Any specific project you are going to post on LMR are you going to use it for.

markusb · December 23, 2014, 11:42am

Automated test strip reading

Automated test strip reading for my medical tricorder MK III

markusb · December 23, 2014, 11:47am

Shall I really Google it for

Shall I really Google it for you?

PS: You can eliminate ambient light by doing two readings, one with LED’s off and one with LED’s on.

mtriplett · December 26, 2014, 12:06pm

Thanks for posting

Thanks for posting this, you made it so clear and understandable. Can’t wait to try it out on something.

silux · December 26, 2014, 7:55pm

One px camera

Now it can be used to diagnose color blindness too! I guess it could also suggest a suitable sun protection level after getting the skin tone.

markusb · December 26, 2014, 9:58pm

Two great ideas, Silux!

Two great ideas, Silux!

silux · December 28, 2014, 4:49am

Another it’s to use a photo

Another it’s to use a photo camera for skin conditions, soft tissue, eye irritations and other kind of things a visual check gives good results. An ultimate tricorder would use x-ray scan and echografy, but x-ray scan it’s forbidden to civilians.

markusb · December 28, 2014, 8:10am

In my opinion an ultimate

In my opinion an ultimate medical tricorder would do better a complete blood count than a x-ray scan or sonography.

silux · December 28, 2014, 10:25am

Or a backup copy so you can

Or a backup copy so you can be cloned later?

markusb · December 28, 2014, 11:07am

See here:

See here: http://www.mayoclinic.org/tests-procedures/complete-blood-count/basics/why-its-done/prc-20014088

mtriplett · December 29, 2014, 1:19pm

A few k-NN and “Euclidean Space” Concerns

Hi Markus, I am thinking of building a generic and configurable KNearestNeighborAgent into my brain project that can be fed with sensor values of the designers choosing. My questions revolve around how far one can go with k-NN in adding dimensions and still be valid from a math perspective.

Example 1: Using k-NN to have the robot guess what room it is in. (Simple case I think)

GPS is really inaccurate, but because most rooms are on the perimeter of a house, if the GPS coordinates where the robot thinks it is fall outside of the home, one room will often be closer and should be more likely than others to be the correct room.

Inputs: GPS coordinates of the robot, and a set of coordinates for each large room in the house.

Output: The guess of the nearest room.

If my 2D example is valid, Is there a valid way to add non-spatial dimensions to this like (size of room, temperature, color, direction of windows) ?

If one room has windows on the north side and one on the south, and the robot detects windows at 45 degrees (northeast)…the distance would be 45 for one room and 135 for the other. Can this somehow be used as an additonal dimension? Doesn’t really seem to fit with Euclid.
If one room is large and the other small, and the robot can estimate the size of the room it is in, can this be used?

In the color example, I could see how it worked as all the color dimensions and distances seemed like they were on an equal footing. Basically, can I mix apples and oranges? Do I need to (or can I) “normalize” the dimensions/distances in some way to make it work for Euclid? Maybe I am barking up the wrong tree.

Example 2: Emotions…less practical example but I have a question

I have seen several papers that model emotions as a kind of 10 dimensional Vector space. Is k-NN suitable for classifications like this? I would think so, but my knowledge is limited. I realize I haven’t spelled out the actual classification to be done, I’m just asking if the math of KNN is appropriate in these “N-Dimensional vector spaces” I have no formal training on this…my applogies if I am murdering the terminology.

I can study this online if this is too much to ask…just thought you might have some quick insights.

Regards,

Martin

markusb · December 30, 2014, 11:00am

Hi Martin,As I wrote

Hi Martin,

As I wrote above:

In general, for an n-dimensional Euclidean space

, the Euclidean distance between two points

and

is given by

This means you can theoretically use as many dimensions as you wish (< ∞). But practically too many dimensions can cause a Curse of dimensionality.

As long a a funcion is valid in an Euclidean space

, you can use kNN resp. Euclidean distances. The “IR” means just the set of all real numbers.

Let’s start with a 2-dimensional space. You have a function f(x). x is the input, f(x) the output. If you find a function where you can input an amount of apples and the function will output an amount of oranges, well, then you can "mix apples with oranges’. Now in a 3-dimensional space you have functions which depending on two variables, for example x and z. You have two inputs (x and z), but still one output: f(x,z). In- and outputs might have no units or they have one. Just never use different units on the same axis (for instance Fahrenheit and Celsius or milimeter and meter),

Ok, so we need a function (can also be a sensor that produces outputs on given inputs). Let’s say, the temperature depends on the room. You have 2 rooms, in room #1 you measure 50F, in room #2 the temperature is 100F. Seems a simple function: Roomnumber(T) = T[F]/50[F/roomnumber]. (We can let the robot do the work and find an appropriate function by using linear regression etc.) We have now two 2-dimensional training data for the robot (50,1) and (100,2). These are our classes. Now we put the robot randomly in any of the two rooms and let it classify in which room (class) it is. The robot measures 85F. Roomnumber(85F) = 85/50=1.7. Therefore the new data to classify is (85,1.7). We don’t need to run a kNN algorithm in this very simple case as 1.7 is closer to 2 as to 1 and the function is linear. So it is more likely that the robot is in room #2.

If you have attributes which aren’t discret values or are difficult to transform into discret values, I suggest using a Naive Bayes classifier.

PS:

A simple kNN based emotional agent could be build using the circumplex model of affect. The training set for the kNN would look as following. Based on the current aurosal and pleasure level of the robot, a new data point would be added and then its nearest neighbour and therefore the emotional state determined.

PS II: I am just writing a tutorial about that idea