Recent comments in /f/MachineLearning

pyepyepie t1_j95f3m2 wrote

I actually think your approach shows the idea better than the original paper. However, the original paper can be implemented with smaller language models which might be better for people who want to deploy it. All over, I think the application is almost trivial and I am not surprised it worked well for you (due to the crazy power of LLMs).

Great work!

9

pyepyepie t1_j95e9ka wrote

Reply to comment by TeamRocketsSecretary in [D] Please stop by [deleted]

I implemented GPT-like (transformers) models almost since it was out (not exactly but worked with the decoder in the context of NMT and with encoders a lot like everyone who does NLP, so yeah not GPT-like but I understand the tech) - I also argue you guys are just guessing. Do you understand how funny it looks when people claim what it is and what it isn't? Did you talk with the weights?

Edit: what I agree with is that this discussion is a waste of time in this sub.

2

DevarshTare t1_j95ct2p wrote

What matters while running models?

hey guys, I'm new to machine learning and just learning from the basics. I am planning to buy a GPU soon for running pre-built models from google colab.

My question is after you build a model what matters for the models runtime? Is it the Memory, the bandwidth or the cuda core you utilize?

Basically what makes an already trained model run faster when using in application? I can imagine it may vary from application to application, but just wanted to learn what matters the most when running pre trained models?

1

I_like_sources OP t1_j95ckjv wrote

It's not about wanting a free open box tool. It's about the lack of transparency and accountability in the AI community. Developers need to take responsibility for their creations and provide support and feedback to users, particularly in critical applications like healthcare or finance. By providing more transparency and support, we can improve the quality and reliability of AI systems, which benefits everyone in the field.

−19

Morteriag t1_j95b954 wrote

Last I checked Roboflow only had point-to-point vector masks for segmentation. In my experience that makes getting quality annotations a pain. In Labelbox, you can also hold in the mouse button. Hasty.ai focus on auto annotations, and by the look of the image you posted, it might be a good fit for your usecase.

1

Morteriag t1_j95aodr wrote

That size would do well as a PoC, not much more, and you should be able to annotate all the data within a day or two. Automation does not make that much sense at this scale. I love Roboflow for bounding boxes, but LabelBox has superior tools for segmentation. Sure, with this small data set you can use cross validation, although a hold out test set is also preferable. I would almost consider hand-picking the test set at this scale to make sure you get a sense of how it performa on challenging examples. What is the pixel size of your images? I know microscopy/histology images typically can cover large areas and one image could in fact be considered a mosaic of many “normal” sized images.

2

_Minos t1_j95amf3 wrote

Hey, creator of above implementation here.

You're right that there's lots of ways accuracy could feasibly be improved, by using more varied APIs, navigating to search results and creating embeddings of the resulting website etc. Ultimately, a lot of this kind of more advanced chaining of LLM and API requests can be done with libraries like langchain.

For this one, i wanted to show how effective a much more simple approach can be. For search results, i simply chain together the returned google "snippets" and inject the resulting string back into the prompt. Often times, this means there can actually be conflicting information, such as for example dates talking about events adjacent to but ultimately irrelevant to the search query. However, this is where GPT is generally doing an excellent job of picking out the correct bit of info, so no more sophisticated filtering or parsing by the app is required. Just giving a raw dump of the search results to the model.

17

Lifaux t1_j957q53 wrote

If you're having to debug code, VSCode has really good integrations for running on your remote server. Unless you're already very familiar with vim, it's going to be quicker to set this up.

Ensure you've got rsync experience - no one wants to include venv when pulling your changes back from the remote side.

Run the image you're using remotely locally via docker first. Check your code works, you don't want to be messing around with fixes while your GPUs sit idle.

If you're running compiled code, check the CPU architecture. I wasted a day debugging a fault that was due to compiling starspace on a build server that had different architecture to our remote server.

Tmux is a godsend.

19

A1-Delta t1_j9572gt wrote

Tweaking a CNN without retraining makes it sound like you want a no-code option on your machine learning.

Totally agree that model interpretability is a challenge, but there is a whole subsection of our field working on that. The fundamental design of deep learning sort of precludes what you’re talking about - at least given our current understanding of model interpretation. At best, a model may be trained to give options on certain aspects based on its input (we see this all the time), but that doesn’t sound like what you want. It sounds like you want to be able to target specific and arbitrary components of an output and intuitively modify the weights of all nodes contributing to that part of the output - presumably in isolation.

I think your challenge might lie with a fundamental lack of understanding of how these models actually work. I don’t mean that as a dig - they’re complicated. I just want to help bring you to a place of understanding about why the field is how you’re experiencing it.

Not a huge fan of massive edits to original posts after people have started responding. Your newly added recommendations put an onerous responsibility on any open source authors who might make their work public as a hobby rather than a career.

16

Rocksolidbubbles t1_j956kre wrote

Reply to comment by TeamRocketsSecretary in [D] Please stop by [deleted]

>To suggest that it’s performing human like processing of emotions because the internal states of a regression model resemble some notion of intermediate mathematical logic is ridiculous especially in light of research showing these autoregressive models struggle with symbolic logic

Not only that. The debate on 'sentience' won't go away, but it will definitely be a lot more grounded when people who are expert in - for example, physiology of behaviour, cognitive linguistics, anthropology, philosophy, sociology, psychology, chemistry get involved.

For one thing they might mention things like neurotransmitters, and microbiomes, and epigenetics, or cultural relativity, or how perception can be relative.

The human brain is embodied and can't be separated from it - and if it were it would stop thinking like a human would. There's a really good case to be made (embodied cognition theory) that human cognition partly lies upon a metaphorical framework made of euclidean geometrical shapes that were derived from the way a body interacts with an environment.

Our environment is classical physics - up and down, in and out, together and apart - it's all straight lines, boxes cylinders. We're out of control, out of our minds, in love - self control, minds and love are conceived of as containers. Even chimps associate the direction UP with the abstract idea of being more superior in the heirarchy. You'll be hard pressed to find any western cultures where Up doesn't mean good or more or better, and DOWN doesn't mean bad or less or worse.

The point being, IF this hypothesis is true, and IF you want something to think at least a little bit like a human, it MAY require a mobile body that can interact with the environment and respond to feeback from jt.

This is just one if the many hypotheses non hard science fields can add to the debate - it really feels they're too absent in ai related subs

1

I_like_sources OP t1_j9565gb wrote

Good questions. Machine learning models are usually black boxes, that work or don't as expected.

There is no detail tweaking possible, only re-training. And the specification for good training data is vague at best.

That causes unnecessary frustration, time-consumption and is similar to the blind leading the blind.

The attitude is "offer just more and more data and hope the ai will figure things out, if not, offer more". I am sure I am not the only one who sees fault in this approach.

−32

A1-Delta t1_j955sgx wrote

I’m not sure I’m following you. Are you concerned that machine learning models are not easily customizable enough?

Is your trouble with the fundamental concept of transfer learning, that data selection and preparation is difficult, that convolutional neural networks are “black boxes”, or something else?

26