Recent comments in /f/MachineLearning

MrAcurite t1_j8dnscj wrote

I get that. I've come to actively hate a lot of the big, visual, attention-grabbing work that comes out of labs like OpenAI, FAIR, and to some extent Stanford and Berkeley. I work more in the trenches, on stuff like efficiency, but Two Minute Papers is never going to feature a paper just because it has an interesting graph or two. Such is life.

7

jishhd t1_j8djlmd wrote

That's basically what they talk about in this video you may find interesting: https://youtu.be/wYGbY811oMo

TL;DW: Discusses ChatGPT+WolframAlpha integration where the language model knows when to call out to external APIs to answer questions, such as precise mathematics.

You can try it out here by pasting your own API key: https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain

10

GFrings t1_j8dixv3 wrote

In general, absolutely yes. In practice, the review process for most tier 1 and 2 conferences right now is a complete roll of the dice. For example, WACV and some other conferences explicitly state in their reviewer guidelines that you should consider the novelty of the approach over the performance. But I still see many reviews that ping the work for lack of SOTAness. The best thing you can do is make your work as academically rigorous as possible (have good baseline experiments, ablation studies, analysis...) And submit until you get in. Don't worry about what you can't control, which is randomly being assigned to a dud reviewer.

12

throwaway2676 t1_j8digqj wrote

Here are the top 10 posts on my front page right now:

>[R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research

>[D] Quality of posts in this sub going down

>[D] Is a non-SOTA paper still good to publish if it has an interesting method that does have strong improvements over baselines (read text for more context)? Are there good examples of this kind of work being published?

>[R] [N] pix2pix-zero - Zero-shot Image-to-Image Translation

>[P] Extracting Causal Chains from Text Using Language Models

>[R] [P] Adding Conditional Control to Text-to-Image Diffusion Models. "This paper presents ControlNet, an end-to-end neural network architecture that controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions." Example uses the Scribble ControlNet model.

>[R] [P] OpenAssistant is a fully open-source chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

>[D] What ML dev tools do you wish you'd discovered earlier?

>[R] CIFAR10 in <8 seconds on an A100 (new architecture!)

>[D] Engineering interviews at Anthropic AI?

From this list the only non-academic/"low quality" posts are the last one and this one. This is consistent with my normal experience, so I'm not really sure what you are talking about.

8

uristmcderp t1_j8db0gw wrote

The whole assessing its own success is the bottleneck for most interesting problems. You can't have a feedback loop unless it can accurately evaluate if it's doing better or worse. This isn't a trivial problem either, since humans aren't all that great at using absolute metrics to describe quality, once past a minimum threshold.

9

Despacereal t1_j8d971u wrote

In a way yes. I think general intelligence (consciousness in most animals) developed evolutionarily to manage a wide variety of sensory inputs and tasks, and to bridge the gaps between them.

As we develop more individual areas of AI, we will naturally start to combine them to create more powerful programs, such as Toolformer combining the strengths of LLMs and other models. Once we have these connections between capabilities, it should be easier to develop new models that learn these connections more deeply and can do more things.

Some of the things that set us apart from other animals are our incredible language and reasoning capabilities which allow us to understand and interact with an increasingly complex world and augment our capabilities with tools. The perceived understanding that LLMs display using only patterns in text is insane. Combine that with the pace of developments in Chain of Thought reasoning, use of Tools, other areas handling visuals, sound, and motion, and multimodal AI, and the path to AGI is becoming clearer than the vision of a MrBeast™ cataracts patient.

1