Recent comments in /f/MachineLearning

bubudumbdumb t1_j4n08pi wrote

So basically you are printing the cards? Or you have a jpg of the cards or you can scan them?

If yes then what you can do is apply SIFT or even faster ORB to the pictures of the cards to detect and describe the salient points. Build a nearest neighbors index of the key point feature space.

(Optionally) Then you can scale the coordinates of the key points to match the intended dimensions in centimeters (or inches of that's your favorite)

Then you can perform the same with the images from your camera. Get run the key points you detect from the camera through the nn index to match each to the most similar key point from the cards. You are going to get a lot of false positives but don't worry : you can use a ransac approach to filter the matches that don't result in a consistent geometry.

The ransac procedure will return a calibrated fundamental matrix that you can use to project the rectangle of the card to the image space captured by the camera.

All the algorithms I mentioned are available in opencv (also the nn index but I dislike that since there are more modern alternatives). Also there are tutorials on how to use and visualize this stuff.

If this is geometrical gibberish to you check out the ORB paper. Figure 1, 9 and 12 should confirm whether this is the kind of matching you are looking for.

https://scholar.google.com/scholar?q=ORB:+An+efficient+alternative+to+SIFT+or+SURF&hl=en&as_sdt=0&as_vis=1&oi=scholart#d=gs_qabs&t=1673904812693&u=%23p%3DWG1iNbDq0boJ

6

hundley10 OP t1_j4mvwoj wrote

For this particular problem, the image could contain many corners - and even other full rectangles. The goal is to detect the specific type of paper card I'm interested in - easily identifiable based on color/pattern - but not easily extracted from a Sobel filter.

1

Freonr2 t1_j4mvhhf wrote

A100 and H100 are data center GPUs. Very expensive, tuned for training large models. They also use on-package HBM memory instead of GDDR on the board for improved memory bandwidth.

A100 is Ampere, same architecture as the 30xx series, but built for training with a lot more tensor cores and less focus on Cuda cores. Most often seen in SXM form factor in special servers that offers substantially more NVLink bandwidth between GPUs for multi-gpu training (and the special servers the SXM cards go into also have considerable network bandwidth for clustered training). They do make PCIe versions. Does not support FP8. Typical setup is an AGX server with 8xA100. These are a few hundred grand for the whole server, even ignoring the power and network requirements, etc to utilize it.

H100 is Hopper, newer than Ampere, but I don't believe ever made into a consumer part but perhaps closer to Ada (40xx) in features than it is to Ampere (30xx) since it has FP8. It's basically the replacement for A100, much like the 40xx is the replacement for the 30xx. These are again often in HGX server boxes for a several hundred grand. Unsure if there is a PCIe version?

Nvidia removed NVLink from the 40xx series, but its still technically available on 3090s. They're sort of segmenting the market here.

If they decide to release a 4090 with 48GB (or Ada Titan or whatever branding they decide on) it could be a monster card if you only need or want a single card, but it may also be $3k+...

12

JiraSuxx2 t1_j4muqql wrote

I’m not a 100% sure how yolo works but I think images are cut into grids and then detection is done per grid square. The results are processed, the bounding boxes are computed the old fashioned way from the predictions. That’s also how they get multiple predictions per image I think.

In your case, even if you detect corners how do you know they belong to the same card?

0

curiousshortguy t1_j4mt461 wrote

Think of chatGPT as a multi-task meta-learner where the prompt you give it specifies the task. It's essentially only trained on text generation (with some fine-tuning to make it more conversational). So you need to set-up a prompt to make it generate reasonable answers. It can't think or calculate, but by showing it how to generate a right answer in the prompt, it can leverage that information to give you better answers.

1

juniperking t1_j4mma6c wrote

>General comment: it's surprising to me that there aren't any instabilities introduced by stapling models together like this. If someone had come up to me with this description of an architecture several years ago, I would have told them that it was too complicated to work. Not sure what about my intuitions I should change in response to observing that this works despite them.

probably the most important thing that makes model configurations like this work is that they're very large and generalizable. a lot of prior research often focuses on finetuning for a specific task or dataset but the fact that clip (for example) is able to learn generalized text + image embeddings across multiple domains helps downstream training work

3

chaosmosis t1_j4mjxh9 wrote

Are the 77 token embedding vectors just concatenated together as ClipText's output? Is there any structure to their ordering as processed by the Image Information Creator? Assuming a trained model, would permuting the vectors' order before passing them forward to the next subcomponent break anything?

General comment: it's surprising to me that there aren't any instabilities introduced by stapling models together like this. If someone had come up to me with this description of an architecture several years ago, I would have told them that it was too complicated to work. Not sure what about my intuitions I should change in response to observing that this works despite them.

1

BrotherAmazing t1_j4mjntx wrote

There could be a separate database and algorithm to detect this if they wanted to, but this wasn’t a goal of chatGPT.

You wouldn’t need an AI/ML to do this, and also note it isn’t 100% impossible for a human to respond identically to chatGPT’s response, especially for shortest length responses, without knowing chatGPT would respond the same way.

Why do you “need” this? Just curious.

1

BrotherAmazing t1_j4mj44x wrote

If the whole motivation here is to detect the cheating student, most cheating students won’t simply copy and paste but will spend at least 5 - 15 min making modifications and writing some in their own language.

Policing cheating beyond punishing those who obviously are cheating in the worst ways is not as important as one might think. Cheating on highly competitive graduate school entrance exams is something to strictly police, but not an English writing assignment or a math word problem. It sounds corny, but the student in those cases really is just cheating themselves.

Any professor who talks to you in person or in class discussions or office hours, sees how you interact in group projects, and any employer who works with you on complex real-world problems chatGPT can’t solve will know very quickly that you (the cheater) don’t have a firm grasp of the material, prerequisites, or know-how to apply it, and the student or employee that does have that know-how and understanding gets the promotion, better Reference, has the better grades still (on average over all classes), and will interview much better for jobs and can speak intelligently about what they accomplished and solve problems on the spot on a white board, while the cheater fumbles and cannot pull out chatGPT during the interview, lol.

1

royalemate357 t1_j4migdx wrote

TF32 is tensorfloat 32, which is a relatively new precision format for newer GPUs. Basically, when doing math, it uses the same number of mantissa as FP16 (10 bits), and the same number of exponent bits as normal float32 (8 bits). more on it here: https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/

3