Recent comments in /f/MachineLearning

jostmey t1_j4vmqcs wrote

A deep neural network trained by backpropagation will converge to a local minimum if you use gradient descent or even stochastic gradient descent. However, there are many components added to a deep neural network like dropout and batch normalization, which as far as I know, do not come with convergence guarantees.

There are no guarantees about finding a global minimum

1

tdgros t1_j4vipol wrote

if the two cameras are rigidly fixed, then you can calibrate them like one calibrates a stereo pair, and at least align the orientation and intrinsics. The points very far from the camera will be well aligned, the ones very close will remain unaligned.

The calibration process will involve you pointing positions by hand, but the maths for the correction is very very simple after that.

5

nmkd t1_j4vh7cl wrote

Okay, in that case, I'll try to be a bit more helpful lol.

I think you absolutely need to use something like YOLO for object identification/classification.

  • Humans and animals are warmer than the environment

  • Cars and other vehicles are warmer than the environment

  • Glass blocks IR but not visible light

You could get the overall "look" with just image-based networks, but to make it really convincing (more like COD's thermal vision) you need classification in order to make objects look hot that are supposed to be hot.

1

kingdroopa OP t1_j4vguxt wrote

The GAN models I've tested are based on the 'unaligned' approach (e.g. CycleGAN). I still have not tested to cut and resize the images, to make them show the same region. My immediate thought would be that the top-and-bottom of both images might dissapear, but perhaps its ok still?

1

BlazeObsidian t1_j4vc61q wrote

That depends on the extent to which the pixel information is misaligned I think. If cropping your images is not a solution and a large portion of your images have this issue, the model wouldn't be able to generate the right pixel information for the misaligned sections. But it's worth giving a try with Palette if the misalignment is not significant.

2

BlazeObsidian t1_j4var74 wrote

Sorry, I was wrong. Modern deep VAE's can match SOTA GAN model performance for img superresolution(https://arxiv.org/abs/2203.09445) but I don't have evidence for recoloring.

But diffusion models are shown to outperform GAN's on multiple img-to-img translation tasks. Eg:- https://deepai.org/publication/palette-image-to-image-diffusion-models

You could probably reframe your problem as an image colorization task:- https://paperswithcode.com/task/colorization and the SOTA is still Palette linked above

1

JacksOngoingPresence t1_j4v9jh4 wrote

There isn't much difference between "Simply 👍/👎" and "scale of 1-5". They will probably give ~same results. I understand first one as {0, 1} and second as {0, ... , 1}. It's just the question of resolution. the 1-5 thing will most likely give you faster convergence, but it can also f you up if some of your data gets mislabeled. Since it's easier to make mistakes with high resolution.

But in a limit, if you take 1 million different people and ask them to asses your model in a binary fashion, or on a scale of 1 to 10, and then average out results, you will get the same thing. It's just from a human perspective, it's easier to asses things as yes-no. (e.g., "did you like this new movie?" vs "how would you rate this movie on a scale from 1 to 10?"). But from computer's perspective, ML wants that label to be as close to its true value as possible.

2

Anjum48 t1_j4v8mpm wrote

+1 for UNets. Since IR will be a single channel you could use a single class semantic segmentation-type model (i.e. a UNet with a 1-channel output passed through a sigmoid). Something like this would get you started:

model = sm.Unet('resnet34', classes=1, activation='sigmoid') 

Edit: Forgot the link for the package I'm referencing: https://github.com/qubvel/segmentation_models

Many of the most popular encoders/backbones are implemented in that package

Edit 2: Is the FOV important? If you could resize the images so that the RGB & IR FOV are equivalent then that would make things a lot simpler

4