Good development boards/cameras for programming vision algorithms where low-level access to pixel data is well documented

arosko · December 11, 2023, 8:41am

I am interested in getting into writing my own computer vision math, in the context of a device that can actually move/be moved around the world (i.e. not processing stored image files post hoc, but feeding data from a camera directly into code). I am looking to actually code things from scratch so I learn the math and can modify it however I like, i.e. I won’t be simply “piping” data into existing 3rd party vision libraries/high level algorithms through a “glue language”. So I need a well documented way to get an array of pixels that I can loop over, and an easy to set up toolchain to program whatever device I’m using at a relatively low level.

Some people on Reddit have suggested the ESP32-CAM. It’s extremely inexpensive but I’m not sure how much processing it’s really capable of. Then you have things like the Raspberry Pi, and at the high price end are some of the NVIDIA boards that have high-end GPUs. In trying to determine which is the best fit, there are some general questions I have about speed limitations/architecture.

When connecting a CPU/microcontroller to a camera, what is really the limiting factor in getting frames from the sensor? I was really surprised when reading the ESP32-CAM documentation, when some of the capture modes are only usably fast when the sensor outputs JPEG rather than individual pixel intensities–suggesting that having a sensor encode JPEG and having the microcontroller use some math operations to decode the JPEG back to pixel values is actually faster than sending data directly! Does this mean that copying data from the sensor to a CPU’s memory is slow relative to actually processing it (for instance doing convolutions, Fourier transforms, etc.)?
Similarly, when algorithms are split between a CPU and a GPU (in cases where the board actually has a GPU), how much slowdown does transferring the data add? Is this even worth it unless there are many millions of mathematical operations performed each frame?
On the Raspberry Pi, how well documented are the ways to grab a pixel buffer from the camera, send it to the GPU, etc.? Some reading suggests that the framework for doing this is called “libcamera”–however it seems quite poorly documented except for how to set image parameters (contrast, white balance, etc.).
Some people have suggested that using a high-level library like OpenCV for input/output, even if I use it only for that and then pass the pixel buffers to my own algorithms, will cut down a lot on the code necessary to set up the context in which the algorithms run. Is this in fact true, and if it is, does it affect what hardware platforms are easiest to get set up and running?

igor_X · December 20, 2023, 8:31pm

Hi @arosko !

Welcome to the RobotShop Community!

Although I don’t have the answer, perhaps someone from the community will know and answer here.

arosko · December 21, 2023, 5:49am

That’s funny, I’m virtually sure that two people responded before you but their posts are now gone. At least I was able to read them though.

igor_X · December 21, 2023, 8:05am

Hi @arosko !

Are you sure it is this post?

Because I see no previous history.

arosko · December 27, 2023, 5:13pm

I was virtually sure it was here. The only other place I asked the same question was on Reddit and while I did get some replies there, the ones I’m referring to aren’t there either. I actually am almost sure I even replied to one of the ones I got here.

igor_X · December 27, 2023, 7:49pm

So it is probably Reddit, since here I am the only one replying to you