Recent comments in /f/MachineLearning

dmart89 t1_j4nio9p wrote

Idk, I guess the point is that if text is 100% gpt written and not reviewed by a human, then there is a risk that gpt learns from bad gpt examples. If you review and modify it to remove the watermark, then it is effectively human reviewed/labelled content and ok for re-ingestion in future iterations.

But tbh the guys at openai are pretty capable, I'm sure they'll think of something. I don't know anything more than the headline I read.

1

dandandanftw t1_j4nf5i3 wrote

A corner detector then depending on how many corner you got, you can brute force any possible rectangle. You can also use hough line detection to limit number of corners. You can also use a simple model like SVM to compare the corners and patterns of the given images. You should also check out glcm for preprocessing the pattern

4

nateharada OP t1_j4ne979 wrote

Nice! Right now you can use the end_process trigger to just return 0 when the trigger is hit from the process, but it should be fairly straightforward to externalize the API a little bit more. This would let you do something like this in your script:

from gpu_sentinel import Sentinel, get_gpu_usage
sentinel = Sentinel(
    arm_duration=10,
    arm_threshold=0.7,
    kill_duration=60,
    kill_threshold=0.7,
    kill_fn=my_callback_fn,
)
while True:
    gpu_usage = get_gpu_usage(device_ids=[0, 1, 2, 3])
    sentinel.tick(gpu_usage)
    time.sleep(1)

Is that something that would be useful? You can define the callback function yourself so maybe you trigger an alert, etc.

5

moschles t1_j4nch5w wrote

> Seems to be derived by observing that the most promising work in robotics today (where generating data is challenging) is coming from piggy-backing on the success of large language models (think SayCan etc).

There is nothing really magical being claimed here. The LLMs are undergoing unsupervised training. essentially by creating distortions of the text. (one type of "distortion" is Cloze Deletion. But there are others in the panoply of distorted text.)

Unsupervised training avoids the bottleneck of having to manually pre-label your dataset.

When we translate unsupervised training to the robotics domain, what does that look like? Perhaps "next word prediction" is analogous to "next second prediction" of a physical environment. And Cloze Deletion has an analogy to probabilistic "in-painting" done by existing diffusion models.

That's the way I see it. I'm not particular sold on this idea that the pretraining would be literal LLM trained on text, ported seamlessly to the robotics domain. If I'm wrong, set me straight.

1

avocadoughnut t1_j4n5sp8 wrote

From what I've heard, they want a model small enough to run on consumer hardware. I don't think that's currently possible (probably not enough knowledge capacity). But I haven't heard that a decision has been made on this end. The most important part of the project at the moment is crowdsourcing good data.

5

bubudumbdumb t1_j4n54nk wrote

The Key here is that by detecting key points you don't need to detect the corners per se : you detect at least a dozen points from the pattern on the card then assuming the card is a rectangle on a plane you can identify the corners.

In other words this can be very robust to occlusions, like you might not see more than half of the card and still be able to identify where the corners are

5

robobub t1_j4n3gcm wrote

A couple options off the top of my head

  • Add orientation prediction to the bounding box
  • Add keypoints for the 4 actual corners as a prediction
  • Postprocess boxes with classical techniques, looking for the outermost corners that fit certain properties
  • Do everything classically, and deal with the difficulties you have mentioned in your comment.

The first two require annotations of attributes for each box, and will be predicted directly by the model. Though note that you don't have to do this for every label, you can just not train parts of the model when certain attributes are unlabeled.

Both will require some care in modeling, e.g. orientation can have a loss condition at 360 degrees that you'll want to handle, and regressing keypoints can be done well and not well, reference how corners are modeled. And then of course you'll need to postprocess the model's outputs to align/visualize on an image.

1

sayoonarachu t1_j4n2w5j wrote

Quite a bit and even more if you use optimized frameworks and packages like voltaml, pytorch lighting, colossalai, bitsandbytes, xformers, etc. Those are just the ones I am familiar with.

Some libraries allow balancing between cpu, gpu, and memory, though obviously, that will come at a cost of speed.

General rule, the more parameters the model, the higher the cost of memory. So, unless you're planning to train from scratch or fine tune in the billions of param, you'll be fine.

It's gonna take playing around with hyper parameters, switching between 32, 16, 8 bit quant with pytorch or other python packages, testing between offloading weights to gpu/cpu, etc to get a feel of what you can and can't do.

Also, if I remember correctly, pytorch 2.0 will somewhat benefit the consumer nvidia 40 series to some extent when it is more ready.

Edit: p.s. supposedly a new Forward Forward algorithm can be "helpful" for large models since there's no back propagation

1