BeatLeJuce t1_j6mlxjc wrote on January 31, 2023 at 12:11 PM

Reply to [D] deepmind's ai vision by [deleted]

your question is answered in the abstract itself ("using only pixels and game points as input"), and repeated multiple times in the text ("In our formulation, the agent’s policy π uses the same interface available to human players. It receives raw RGB pixel input x_t from the agent’s first-person perspective at timestep t, produces control actions a_t ∼ π simulating a gamepad, and receives game points ρt attained"). Did you even attempt to read the paper? The concrete architecture showing the CNN is also in Figure S10.

eldenrim t1_j6mlca7 wrote on January 31, 2023 at 12:04 PM

Reply to comment by worriedshuffle in [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron

If an A.I accomplishing a task means it's not actually intelligence then A.I is an impossible term. Let's pack our bags and call it a day lol

PredictorX1 t1_j6mkzl0 wrote on January 31, 2023 at 12:00 PM

Reply to [D] Have researchers given up on traditional machine learning methods? by fujidaiti

To be clear, there are neural networks which are "deep", and others which are "shallow" (few hidden layers). From a practical standpoint, the latter have more in common with other "shallow" learning methods (tree-induction, statistical regressions, k-nearest neighbor, etc.) than they do with deep learning.

You're right that many people (especially in the non-technical press) have erroneously used "machine learning" to mean specifically "deep learning", just as they've used "artificial intelligence" to mean "machine learning". Regardless, there are still non-deep machine learning methods and other branches of A.I. In practice, non-deep machine learning represents the overwhelming majority of applications today.

I haven't followed the research as closely in recent years, but I can tell you that, deep learning aside, people have only begun to scratch the surface of machine learning application.

silentsnake t1_j6mk21a wrote on January 31, 2023 at 11:50 AM

Reply to comment by qalis in [D] Have researchers given up on traditional machine learning methods? by fujidaiti

To add-on, most real world business data are tabular data.

bitRAKE t1_j6mj7s2 wrote on January 31, 2023 at 11:40 AM

Reply to [Discussion] ChatGPT and language understanding benchmarks by mettle

Ask ChatGPT for an explanation of anything without a known correct answer, and then tell it that "that answer is incorrect". It will proceed to dream up a new answer. This could be non-existent syntax for a programming language, for example. The sequential nature of the model means it can paint itself into a corner quite easily.
Isn't knowledge accuracy a by-product of modeling correct language use to some degree, and not the design goal of the system? A fantasy story is just as valid a language use as a research paper. Accuracy seems to correlate with how the system is primed for the desired context.

arg_max t1_j6mg664 wrote on January 31, 2023 at 11:02 AM

Reply to comment by jiamengial in [D] Have researchers given up on traditional machine learning methods? by fujidaiti

I think diffusion models are kind of a bad example. The SDE paper from Yang Song has shown that it's all about modeling the score function and this can't be done with simple models. Apart from that, the big text2img models work inside the latent space of a deep vae, make use of conditioning using cross attention which isn't a thing in traditional ML and use large language models to process the text input. All their components are very dl based.

Blutorangensaft t1_j6mdw93 wrote on January 31, 2023 at 10:31 AM

Reply to comment by andreichiffa in [Discussion] ChatGPT and language understanding benchmarks by mettle

Is the critic used for fine-tuning or as a part of the loss function during training?

worriedshuffle t1_j6mduii wrote on January 31, 2023 at 10:30 AM

Reply to [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron

> I’d say that even calling it “AI” is misleading because it’s not intelligent.

I’d say it’s misleading for a different reason. We don’t know what intelligence is. Every time a computer can perform a task, that task is no longer considered a test of “intelligence”. Well, if every task is reducible to something unintelligent then perhaps intelligence was really a mirage in the first place.

andreichiffa t1_j6mdm66 wrote on January 31, 2023 at 10:27 AM

Reply to [Discussion] ChatGPT and language understanding benchmarks by mettle

On a very high level, transformer-derived architectures struggle with the concept of reality because they need distributions in the token embedding space to remine wide. Especially for larger model, the training data is so sparse that without that they would struggle with generalization and exposure biais.

Repeated prompting and prompt optimization can pull out elements of training set from it (in some cases), because in the end they do memorize, but the exact mechanism is not yet clear and cannot be counted on.

You can go around it by adding a « critic » post-processor that would classify if model tries to mention a fact, look it up, and force it to re-generate until statement is factually correct. This is very close to GeDi, the Guided Generation introduced by a Salesforce team back in 2020. Given that OpenAI went this route for ChatGPT and InstructGPT to make them less psycho and more useful to the end users (+ iterative fine-tuning from user's and critic model input), there is a good chance they will go this route as well.

You can also add discrete non-differentiable layers to train model to recognize factual statements from others in-text text and learn to switch between the modes allowing it to process them differently. However, you loose nice back-propagation properties and have to do black-box optimization on discrete layers, which is costly, even by LLM standards. That seems to be the Google approach with PaLM.

jiamengial t1_j6mdcrj wrote on January 31, 2023 at 10:23 AM

Reply to [D] Have researchers given up on traditional machine learning methods? by fujidaiti

Don't think so, diffusion models are based entirely on sampling methods; if anything what's exciting is to take the "traditional" methods and, instead of replacing the whole thing with neural nets, replace only a component of it

qalis t1_j6mczg1 wrote on January 31, 2023 at 10:18 AM

Reply to [D] Have researchers given up on traditional machine learning methods? by fujidaiti

Absolutely not! There is still still a lot of research going into traditional ML methods. For tabular data, it is typically vastly superior to deep learning. Especially boosting models receive a lot of attention due to very good implementations available. See for example:

- SketchBoost, CuPy-based boosting from NeurIPS 2022, aimed at incredibly fast multioutput classification

- A Short Chronology Of Deep Learning For Tabular Data by Sebastian Raschka, a great literature overview of deep learning on tabular data; spoiler: it does not work, and XGBoost or similar models are just better

- in time series forecasting, LightGBM-based ensembles typically beat all deep learning methods, while being much faster to train; see e.g. this paper, you can also see it at Kaggle competitions or other papers; my friend works in this area at NVidia and their internal benchmarks (soon to be published) show that top 8 models in a large scale comparison are in fact various LightGBM ensemble variants, not deep learning models (which, in fact, kinda disappointed them, since it's, you know, NVidia)

- all domains requiring high interpretability absolutely ignore deep learning at all, and put all their research into traditional ML; see e.g. counterfactual examples, important interpretability methods in finance, or rule-based learning, important in medical or law applications

jiamengial OP t1_j6mc7vs wrote on January 31, 2023 at 10:07 AM

Reply to comment by the_Wallie in [D] What's stopping you from working on speech and voice? by jiamengial

To challenge on this a little though; surely at some point people thought free form text was unstructured data?

jiamengial OP t1_j6mbv97 wrote on January 31, 2023 at 10:02 AM

Reply to comment by babua in [D] What's stopping you from working on speech and voice? by jiamengial

That's a good point - CTC and attention mechanisms work on the basis that you've got the whole segment of audio

qalis t1_j6mbu5s wrote on January 31, 2023 at 10:02 AM

Reply to [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron

I recently complied and went through a reading / watching list, going from basic NLP to ChatGPT:

- NLP Demystified to learn NLP, especially transformers

- Medium article nicely summarizing the main points of GPT-1, 2 and 3

- GPT-1 lecture and GPT-1 paper to learn about general idea of GPT-like models

- GPT-2 lecture and GPT-2 paper to learn about large scale self-supervised pretraining that fuels GPT training

- GPT-3 lecture 1 and GPT-3 lecture 2 and GPT-3 paper to learn about GPT-3

- InstructGPT page and InstructGPT paper to learn about InstructGPT, the sibling model of ChatGPT; as far as I understand, this is the same as "GPT-3.5"

- ChatGPT page to learn about differences between InstructGPT and ChatGPT, which are relatively small as far as I understand; it is also sometimes called "fine-tuned GPT-3.5", AFAIK

Bonus reading (heavy math warning, experience with RL required!):

- the main difference between GPT-3 and InstructGPT/ChatGPT is reinforcement learning with human feedback (RLHF)

- RLHF is based on Proximal Policy Optimization algorithm

- PPO page and PPO paper

[deleted] t1_j6m89yv wrote on January 31, 2023 at 9:10 AM

Reply to [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron

[deleted]

antodima OP t1_j6m5aqd wrote on January 31, 2023 at 8:27 AM

Reply to comment by thevillagersid in [D] Sparse Ridge Regression by antodima

Basically is the feasibility ridge regression with sparse inputs, but I want to select partial units of W acting on A and B. For instance, if I have A of (2x5) and B of (5x5) and I choose units 2 and 4, the columns [0,1,3] of A are zeros and columns and rows of B with index [0,1,3] are also zero. I select the units 2 and 4 with some importance mechanism. The question is: there is a way of having W* resulting from filter A and B that is similar to W computed without filtering A and B?

I asked because filtering A and B break the inversion and so the computation of W. I don't know if there exists some way of decomposing B in order to invert more easily or something like this.

Anyway thanks for your interest!

oh__boy t1_j6m3sah wrote on January 31, 2023 at 8:06 AM

Reply to comment by tysam_and_co in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

Interesting, thanks for the detailed answer. This is cool work, I also love to work on projects which squeeze out every last ounce of performance possible to solve a problem. I am somewhat skeptical of how much this applies to other architectures / datasets / problems, since you seem to only have worked on one network and one dataset. I hope you try to find general concepts and show that they apply to more than just that network and dataset and prove me wrong though. Good luck with everything!

currentscurrents t1_j6m3ik5 wrote on January 31, 2023 at 8:02 AM

Reply to comment by MysteryInc152 in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

We could make models with trillions of parameters, but we wouldn't have enough data to train them. Multimodality definitely allows some interesting things but all existing multimodal models still require billions of training examples.

More efficient architectures must be possible - evolution has probably discovered one of them.

currentscurrents t1_j6m2ljf wrote on January 31, 2023 at 7:50 AM

Reply to comment by tripple13 in [D] What's stopping you from working on speech and voice? by jiamengial

Is the H100 even out yet?

High hopes that it pushes down the cost of older chips like the A100.

abcdchop t1_j6m17n8 wrote on January 31, 2023 at 7:32 AM

Reply to comment by farmingvillein in [R] Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models - Stanford University Eric Zelikman et al - Beats prior code generation sota by over 75%! by Singularian2501

wait bro the key benefit is the the hierarchical description -- the "language" is just a format for explaining the hierarchical description of the problem in natural language, I think that the improvements your suggesting pretty much describe the paper itself

Mysterious_Gene_08 t1_j6m0ilg wrote on January 31, 2023 at 7:22 AM

Reply to [D] What's stopping you from working on speech and voice? by jiamengial

I wish it was simpler

blackkettle t1_j6lxn1y wrote on January 31, 2023 at 6:47 AM

Reply to comment by pronunciaai in [D] What's stopping you from working on speech and voice? by jiamengial

I used to work in pronunciation modeling!

ezelikman t1_j6lx0vm wrote on January 31, 2023 at 6:39 AM

Reply to comment by farmingvillein in [R] Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models - Stanford University Eric Zelikman et al - Beats prior code generation sota by over 75%! by Singularian2501

Hi, author here!

There are a few ways to interpret this question.

The first is, "why generate a bunch of composable small functions - why not generate complete Python/Lean/etc. implementations directly from the high-level sketch?" If you generate 10 complete implementations, you have 10 programs. If you generate 10 implementations of four subfunctions, you have 10,000 programs. By decomposing problems combinatorially, you call the language model less. You can see the benefits in Fig. 6 and our direct compilation ablation. There's also the context window: a hundred 500-token functions from Parsel is a 50,000-token program. You won't get that with Codex alone.

Another interpretation is, "why do you need to expose intermediate language when you can use a more abstract intermediate representation." You suggest "leveraging the value of LLMs--through a more natural language interface." That's the goal. Parsel is intentionally basically indented natural language w/ unit tests. There's minimal extra syntax for efficiency and generality - ideally, people who've never used Python can understand and write Parsel. The "expert" details here aren't syntax: most people are unfamiliar with the nuances of writing natural language that automatically compiles to code, like the value of comprehensive unit tests.

Another is, "why design a new language instead of writing this as, e.g., a Python library?" My response is we did this too. Internally, Parsel is in Python, and a "Function" class already exists - you can find it on GitHub. Still, you need a process to generate implementations and select one satisfying the constraints, which we call the compiler.

Hope this answers your question!

TheCoconutTree t1_j6lu39i wrote on January 31, 2023 at 6:05 AM

Reply to [D] Simple Questions Thread by AutoModerator

Formatting lat/lng data for neural net feature input:

I've got latitude/longitude columns in a sql table that I'd like to add as features for a neural net classifier model. In terms of formatting for input, I plan to normalize latitude values to a range between 0-1, with 0 mapping to the largest possible negative lat value, and 1 mapping to the largest possible positive lat value. Then do the same for longitude, and pass them in as separate features.

Does that seem like a reasonable approach? Any other tricks I should know?

GFrings t1_j6lr1tn wrote on January 31, 2023 at 5:32 AM

Reply to [D] What's stopping you from working on speech and voice? by jiamengial

Mel spectrograms so scary

Recent comments in /f/MachineLearning