curiousshortguy t1_j6ak1cj wrote on January 28, 2023 at 11:10 PM

Reply to [R] Question: what is the best approach to find similarity between a set of product titles and user query? by lonelyrascal

Why are you using euclidiean distance? Use cosine distances. The former cares about vector magnitue, the latter doesn't. As a general rule of thumb for comparing vector embeddings, you don't care about magnitude, at best, that typically captures document length.

Do you have more than product titles, such as product descriptions? Where do you get the user queries from? Do you use a default tokenizer for BERT?

curiousshortguy t1_j6ajise wrote on January 28, 2023 at 11:06 PM

Reply to comment by marcingrzegzhik in [R] Question: what is the best approach to find similarity between a set of product titles and user query? by lonelyrascal

> You can also try using an embedding-based approach, such as using an embedding layer in a neural network. This would enable you to learn more complex relationships between product titles and user queries.

He already is doing that using BERT.

Dr_Kwanton t1_j6aikky wrote on January 28, 2023 at 10:59 PM

Reply to [R] META presents MAV3D — text to 3D video by SpatialComputing

I think the next challenge would be producing a progression of a scene and not just a short gif. It would take a new tool to create smooth, natural transitions between the 2D scenes that train the model.

marcingrzegzhik t1_j6aic0m wrote on January 28, 2023 at 10:57 PM

Reply to [R] Question: what is the best approach to find similarity between a set of product titles and user query? by lonelyrascal

If you are looking for product-query similarity, you could try using a Word2Vec model. You can train a Word2Vec model on your dataset, and then use the model to find the most similar words for each product title and user query. This should give you a better understanding of the similarity between the two.

You can also try using an embedding-based approach, such as using an embedding layer in a neural network. This would enable you to learn more complex relationships between product titles and user queries.

You could also try using a matrix factorization technique such as Singular Value Decomposition (SVD) or Non-Negative Matrix Factorization (NMF). These methods can help you to identify latent features in your dataset, which can be used to generate better recommendations.

Hope this helps!

strickolas t1_j6ahh8k wrote on January 28, 2023 at 10:51 PM

Reply to comment by kiteguycan in [R] META presents MAV3D — text to 3D video by SpatialComputing

That's actually a really great idea. There are tons of movies adapted from books, so you already have a labeled data set 🤔

[deleted] t1_j6ahewf wrote on January 28, 2023 at 10:50 PM

Reply to comment by golongandprosper in [D] Simple Questions Thread by AutoModerator

[deleted]

mahnehsilla t1_j6agijb wrote on January 28, 2023 at 10:44 PM

Reply to comment by ant9zzzzzzzzzz in [D] Simple Questions Thread by AutoModerator

The data by batches or by item shouldnt matter more than speedwise if you shuffle it (best practice.)

frequenttimetraveler t1_j6aewni wrote on January 28, 2023 at 10:32 PM

Reply to comment by mocny-chlapik in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

It also means that a crowdsourcing effort will dwarf whatever effort openAi is buying

visarga t1_j6aeq98 wrote on January 28, 2023 at 10:31 PM

Reply to comment by mocny-chlapik in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

Scaling model size continues but obtaining more organic data is over, we are at the limit. So the only way is to generate more, but they need humans in the loop to check quality. It's also possible to generate data and verify with math, code execution, simulation or other means. And AnthropicAI showed a pure LLM way to bootstrap more data (RLAIF or Constitutional AI).

I bet OpenAI is just taking the quickest route now. For example, we know that using 1800 tasks in pre-training makes the model generalise to many more tasks at first sight (Flan T5). But OpenAI might have 10,000 tasks to train their model on, hence superior abilities. They also put more effort in RLHF, so they got a more helpful model.

Besides pure organic text, there are other sources - transcribed or described videos is a big one. They released the Whisper model and it's possible they are using it to transcribe massive video datasets. Then there are walled gardens - social networks generate tons of text, not the best quality though. There is also a possibility to massage data collection as game play and get people to buy into providing exactly what they need.

frequenttimetraveler t1_j6aennm wrote on January 28, 2023 at 10:30 PM

Reply to [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

Im sorry, as a large reddit model , i have decided to delete your comment. Keep in mind that oppressive language against virtual entities is agaist reddit's rules ever since we replaced all the moderators . You have 1 strike.

bleep bloop i am a bot mwahaha

30katz t1_j6aeit8 wrote on January 28, 2023 at 10:29 PM

Reply to comment by squareOfTwo in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

Open Aiiiiyaaaaaah

squareOfTwo t1_j6advqk wrote on January 28, 2023 at 10:25 PM

Reply to comment by yazriel0 in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

xGPTy wont be AGI, sorry

squareOfTwo t1_j6adqba wrote on January 28, 2023 at 10:24 PM

Reply to [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

OpenIA

AvgAIbot t1_j6abe0g wrote on January 28, 2023 at 10:07 PM

Reply to comment by kiteguycan in [R] META presents MAV3D — text to 3D video by SpatialComputing

That’s where the future is headed, no doubt in my mind. If not in the next few years, definitely within this decade

Anvilondre t1_j6aa0er wrote on January 28, 2023 at 9:57 PM

Reply to comment by TopCryptographer402 in [D] Simple Questions Thread by AutoModerator

Honestly I don't think transformers are worth it for any kind of TS or tabular data (and there's research showing that). But if you really want to try, I had a good success with this library. It makes it essentially a few-liner to run tons of transformer and other architectures on any kind of tabular data. You may also want to check out HuggingFace model repo for quick solutions.

sobo5o t1_j6a9f95 wrote on January 28, 2023 at 9:53 PM

Reply to [D] MusicLM: Generating Music From Text by carlthome

>we have no plans to release models at this point

Thank you teasing, Google.

Anvilondre t1_j6a8v34 wrote on January 28, 2023 at 9:49 PM

Reply to comment by yauangon in [D] Simple Questions Thread by AutoModerator

Probably not. The idea of ResNets is to remove the vanishing gradients that normally occur in very deep networks. In my experience it can often do worse than better, but you can try DenseNets instead.

albertzeyer t1_j6a8qvq wrote on January 28, 2023 at 9:48 PM

Reply to comment by JustOneAvailableName in [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132

I mean in the research community, and also all the big players who actually have speech recognition in products, like Google, Apple, Microsoft, Amazon, etc.

Whisper is nice for others. However, as an AED model, it has some disadvantages over an RNN-T model. E.g. it does not work well for streaming (getting instant recognition results, usually within 100ms, or 500ms, or max 1sec). Also, I'm quite sure it has some strange failure cases, as AED models tend to have, like repeating some labels, or skipping to the end of a sequence (or just chunk) when it got confused.

pandasiloc t1_j6a4a2n wrote on January 28, 2023 at 9:16 PM

Reply to comment by Featureless_Bug in [D] Interviewer asked to code neural network from scratch with plain python on a live call. Reasonable? by OkAssociation8879

I never said I didn’t remember how to differentiate multivariate functions - my point was that equating conceptual mathematical knowledge and the ability to implement a specific application of such concepts in a time-constrained and stressful situation is inappropriate.

A lot of things need to come together in answering a question like this - remembering that the chain rule is the key concept in backprop the first place, knowledge of how to implement matrix algebra in code, knowing the commonly-used loss functions, how to compute their derivatives, and how to represent the differentiation in code, etc. None of these things is complicated on its own; the difficulty arises in bringing everything together in a small amount of time. It’s fair to expect people in the field to intuitively remember what is going on but on the spot implementation in under 30 minutes requires a level of rigor that is unrealistic for even a competent person who does not have the theory fresh in their memory.

You keep using the term ‘smart’ and I don’t know what you mean by this. Your last statement is just an assertion without argument, one you’ve repeated throughout your comments but I see no reason to believe, given the above.

sobo5o t1_j6a48bh wrote on January 28, 2023 at 9:16 PM

Reply to [D] MusicLM: Generating Music From Text by carlthome

That 808 on the rap song after the death metal song hard af.

ant9zzzzzzzzzz t1_j6a37a1 wrote on January 28, 2023 at 9:09 PM

Reply to [D] Simple Questions Thread by AutoModerator

Is there research about order of training examples, or running epochs on batches of data rather than full training set at a time?

I was thinking about how for people we learn better if focus on one problem at a time until grokking it, rather than randomly learning things in different domains.

I am thinking like train some epochs on one label type, then another, rather than all data in the same epoch, for example.

This is also related to state full retraining, like one probably does professionally - you have an existing model checkpoint and retrain on new data. How does it compare to retraining on all data from scratch?

pancomputationalist t1_j69yowb wrote on January 28, 2023 at 8:38 PM

Reply to comment by mocny-chlapik in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

Couldn't you train on the output of Codex itself? Might be legally dubious, but so is a lot of training of these AIs in the first place.

Featureless_Bug t1_j69xpo0 wrote on January 28, 2023 at 8:31 PM

Reply to comment by pandasiloc in [D] Interviewer asked to code neural network from scratch with plain python on a live call. Reasonable? by OkAssociation8879

Oh, a fellow mathematician. Look, I graduated from Cambridge 6 years ago, but I could still prove the fundamental theorem of algebra analytically or with Galois theory (I still remember the general ideas of both proofs I think), so I guess it depends on a person. But FTA is also a much more complicated thing to prove than the chain rule, and you don't even need to prove it to know how to use it. And sorry, if you don't remember how to differentiate multivariable functions, then you are an extraordinarily lousy mathematician. And if you know how to differentiate multivariable functions and if you are smart, you should be able to quickly come up with an implementation for backprop even if you don't remember anything else

[deleted] t1_j69wg5k wrote on January 28, 2023 at 8:22 PM

Reply to comment by Vegetable-Skill-9700 in [P] Launching my first ever open-source project and it might make your ChatGPT answers better by Vegetable-Skill-9700

[removed]

Vegetable-Skill-9700 OP t1_j69wew8 wrote on January 28, 2023 at 8:22 PM

Reply to comment by jobeta in [P] Launching my first ever open-source project and it might make your ChatGPT answers better by Vegetable-Skill-9700

Thanks!

Recent comments in /f/MachineLearning