deephugs t1_jbtqk9c wrote on March 11, 2023 at 5:20 PM

Reply to [D] Development challenges of an autonomous gardening robot using object detection and mapping. by science-raven

The devil is in the details. Getting robots to work reliably in the gritty dirty environments of agtech is incredibly difficult. Manipulation, even with modern ML and CV, is still very difficult. Let's just say there is a reason there aren't a ton of robotics companies selling a product such as the one you suggested.

quitenominal t1_jbtqio0 wrote on March 11, 2023 at 5:19 PM

Reply to comment by Simusid in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Nice explainer! I think this is good for those with some linear algebra familiarity. I added a further explanation going one level more simple again

currentscurrents t1_jbtpv6w wrote on March 11, 2023 at 5:15 PM

Reply to comment by onebigcat in [D] Unsupervised Learning — have there been any big advances recently? by onebigcat

Run SSL to learn about the structure of the data and then just cluster the embeddings.

quitenominal t1_jbtptri wrote on March 11, 2023 at 5:14 PM

Reply to comment by deliciously_methodic in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

An embedding is a numerical representation of some data. In this case the data is text.

These representations (read list of numbers) can be learned with some goal in mind. Usually you want the embeddings of similar data to be close to one another, and the embeddings of disparate data to be far.

Often these lists of numbers representing the data are very long - I think the ones from the model above are 768 numbers. So each piece of text is transformed into a list of 768 numbers, and similar text will get similar lists of numbers.

What's being visualized above is a 2 number summary of those 768. This is referred to as a projection, like how a 3D wireframe casts a 2D shadow. This lets us visualize the embeddings and can give a qualitative assessment of their 'goodness' - a.k.a are they grouping things as I expect? (Similar texts are close, disparate texts are far)

Simusid OP t1_jbtp8wr wrote on March 11, 2023 at 5:10 PM

Reply to comment by deliciously_methodic in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Given three sentences:

Tom went to the bank to make a payment on his mortgage.
Yesterday my wife went to the credit union and withdrew $500.
My friend was fishing along the river bank, slipped and fell in the water.

Reading those you immediately know that the first two are related because they are both about banks/money/finance. You also know that they are unrelated to the third sentence even though the first and third share the word "bank". If we had naively encoded a strictly word based model, it might incorrectly associate the first and third sentences.

What we want is a model that can represent the "semantic content" or idea behind a sentence in a way that we can make valid mathematical comparisons. We want to create a "metric space". In that space, each sentence will be represented by a vector. Then we use standard math operations to compute the distances between the vectors. In other words, the first two sentences will have vectors that point basically in the same direction, and the third vector will point in a very different direction.

The job of the language models (BERT, RoBERTa, all-mpnet-v2, etc) are to do the best job possible turning sentences into vectors. The output of these models are very high dimension, 768 dimensions and higher. We cannot visualize that, so we use tools like UMAP, tSNE, PCA, and eig to find the 2 or 3 most important components and then display them as pretty 2 or 3D point clouds.

In short, the embedding is the vector that represents the sentence in a (hopefully) valid metric space.

[deleted] t1_jbtogrq wrote on March 11, 2023 at 5:05 PM

Reply to comment by onebigcat in [D] Unsupervised Learning — have there been any big advances recently? by onebigcat

[deleted]

LetterRip t1_jbtn573 wrote on March 11, 2023 at 4:56 PM

Reply to comment by Simusid in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

number of total tokens in input + output.

wikipedia_answer_bot t1_jbtl62p wrote on March 11, 2023 at 4:42 PM

Reply to comment by deliciously_methodic in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

**In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup. When some object

    X
  

{\displaystyle X}

is said to be embedded in another object

    Y
  

{\displaystyle Y}

, the embedding is given by some injective and structure-preserving map

    f
    :
    X
    →
    Y
  

{\displaystyle f:X\rightarrow Y}

.**

More details here: <https://en.wikipedia.org/wiki/Embedding>

This comment was left automatically (by a bot). If I don't get this right, don't get mad at me, I'm still learning!

^(opt out) ^(|) ^(delete) ^(|) ^(report/suggest) ^(|) ^(GitHub)

deliciously_methodic t1_jbtl53y wrote on March 11, 2023 at 4:42 PM

Reply to [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

What are embeddings? I watch videos, but still don’t fully understand them, then I see these pictures and I’m even more confused.

koolaidman123 t1_jbtkuif wrote on March 11, 2023 at 4:40 PM

Reply to comment by maxToTheJ in [D] Is Pytorch Lightning + Wandb a good combination for research? by gokulPRO

there are some fairly annoying things with pytorch lightning, and somethings are definitely harder to do in lightning due to how it's structured. but overall i find for practical purposes i've been liking lightning a lot more than pytorch + accelerate, especially now you can basically use colossal ai with lightning over deepspeed

onebigcat OP t1_jbtjqc4 wrote on March 11, 2023 at 4:32 PM

Reply to comment by JustOneAvailableName in [D] Unsupervised Learning — have there been any big advances recently? by onebigcat

I guess it’s a matter of how you define unsupervised, but isn’t SSL closer to supervised learning because there’s a ground-truth to compare the prediction to? Whereas if you’re just clustering some high dimensional data, you might not know what the “true” or most accurate way of clustering that information might be, especially in something like genomics where there’s a lot of information that has an unknown purpose.

maxToTheJ t1_jbtjn8w wrote on March 11, 2023 at 4:31 PM

Reply to comment by AerysSk in [D] Is Pytorch Lightning + Wandb a good combination for research? by gokulPRO

The structure of pytorch-lightning makes sense for like 99% of workflows which is why other similar libraries use similar structures for what the generic DL process is like. If you are having trouble migrating you are either in a rare use case or really should think about how you structured your code

DigThatData t1_jbthgbj wrote on March 11, 2023 at 4:16 PM

Reply to [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

I think it might be easier to compare if you flip the vertical axis on one of them. you can just negate the values of the component, won't change the topology (the relations of the points relative to each other).

JustOneAvailableName t1_jbthd11 wrote on March 11, 2023 at 4:15 PM

Reply to [D] Unsupervised Learning — have there been any big advances recently? by onebigcat

Isn't the whole transformer revolution due to SSL which is just plain unsupervised learning?

CryInternational7589 t1_jbtgdcx wrote on March 11, 2023 at 4:08 PM

Reply to comment by rainbow3 in [D] Development challenges of an autonomous gardening robot using object detection and mapping. by [deleted]

I think it's really dependant on the sensor to object detection pathways.

rshah4 t1_jbtfzig wrote on March 11, 2023 at 4:06 PM

Reply to [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Two quick tips for finding the best embedding models:

Sentence Transformers documentation compares models: https://www.sbert.net/docs/pretrained_models.html

Massive Text Embedding Benchmark (MTEB) Leaderboard has 47 different models: https://huggingface.co/spaces/mteb/leaderboard

These will help you compare different models across a lot of benchmark datasets so you can figure out the best one for your use case.

montcarl t1_jbtexjk wrote on March 11, 2023 at 3:58 PM

Reply to comment by imaginethezmell in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

This is an important point. The performance similarities indicate that the sentence lengths of the 20k dataset were mostly within the SentenceTransformer max length cutoff. It would be nice to confirm this and also run another test with longer examples. This new test should result in a larger performance gap.

Real_Revenue_4741 t1_jbteqca wrote on March 11, 2023 at 3:57 PM

Reply to comment by science-raven in [D] Development challenges of an autonomous gardening robot using object detection and mapping. by science-raven

YOLO is not enough to create these robots. The difficult part of robotics is being able to actuate from visual feedback. The method you are mentioning is called "visual servoing," and will not be robust enough to actually work. Also, the under 3K price point is quite a bit lower than what you would expect for these projects.

[deleted] t1_jbtcsig wrote on March 11, 2023 at 3:43 PM

Reply to comment by Simusid in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

[deleted]

Simusid OP t1_jbt962j wrote on March 11, 2023 at 3:17 PM

Reply to comment by jobeta in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

UMAP()

Simusid OP t1_jbt91tb wrote on March 11, 2023 at 3:16 PM

Reply to comment by ID4gotten in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

My main goal was to just visualize the embeddings to see if they are grossly different. They are not. That is just a qualitative view. My second goal was to use the embeddings with a trivial supervised classifier. The dataset is labeled with four labels. So I made a generic network to see if there was any consistency in the training. And regardless of hyperparameters, the OpenAI embeddings seemed to always outperform the SentenceTransformer embeddings, slightly but consistency.

This was not meant to be rigorous. I did this to get a general feel of the quality of the embeddings, plus to get a little experience with the OpenAI API.