Recent comments in /f/MachineLearning

dancingnightly t1_j5v5zwe wrote

The internet isn't accessed live by most of these models, as others have said.

You can finetune language models, but you don't add knowledge as such to them; you bias them to output more words in similar order to your sample data; it won't add facts as such if you do this fine tuning.

One approach you can do though is semantic search through your notes for a given topic/search query. You basically collect the relevant notes with meanings similar to your topic/search query. Then you can populate a prompt with that text. The answer will use that information and any facts, if the model is big enough and RLHF tuned (like ChatGPT/Instruct/text-00x models from OpenAI).

An open source module for this is GPTIndex, I also work on a commercial solution which encompasses videos etc too and has some optimisations. It is possible you can add data/facts from the internet to the prompt(context) at time of generation too; you can use an approach like WebGPT.

3

satireplusplus t1_j5v24u2 wrote

If you don't have 8 GPUs you can always run the same computation 8x in series on one GPU. Then you merge the results the same way the parallel implementation would do it. In most cases that's probably gonna end up being a form of gradient accumulation. Think of it this way: you basically compute your distances on a subset of n, but since there are much fewer pairs of distances, the gradient would be noisy. So you just run it a couple of times and average the result to get an approximation of the real thing. Very likely that this is what the parallel implementation does too.

1

olegranmo OP t1_j5v1xsq wrote

Hi DogeMD,

Thanks for the questions! I introduced the Tsetlin machine in 2018 as an interpretable and transparent alternative to deep learning, and it is getting increasingly popular, showing promising results in several domains. The paper reports the first approach to using Tsetlin machines for ECG classification, and it is fantastic that you see potential opportunities in myocardial infarction prediction. If you like, I can do an online tutorial on Tsetlin machines with you and your team to give you a headstart?

11

DogeMD t1_j5uvxiw wrote

Ole, I haven’t heard about the Tsetlin machine before. My group is doing some ML research using CNN architectures to predict myocardial infarctions. Would love to explore the use of Tsetlin machines for showing ECG signs of infarction to users (doctors) since EU legislation mandates explainability. Have you tried anything like this before and if so, do you think the Tsetlin machine would be a good candidate? We are based in Lund, southern Sweden

9

MysteryInc152 t1_j5uvo3i wrote

Nothing that would beat Open AI's stuff (Google's stuff) is open for inference or finetuning from the public.

I think the best Open source alternative is this

https://github.com/THUDM/GLM-130B

https://huggingface.co/spaces/THUDM/GLM-130B

But it's not finetuned for instruction so you have to prompt/approach it like a text completer. And also you'll need a 4x3090 to get it running locally.

The best open source instruction finetuned models are the flan t5 models

https://huggingface.co/google/flan-t5-xxl

If you're not necessarily looking for open source but still actual alternatives that aren't just an API wraparound of GPT, you can try cohere

https://cohere.ai/pricing

Good thing is that it's completely free for non commercial or non production use

or alephalpha

https://app.aleph-alpha.com/

Not free but the pricing is decent and they have a visual language model as well. Something like flamingo

https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model

6

farmingvillein t1_j5utusn wrote

You're probably right, but has anyone built an updated set of benchmarks to compare chatgpt with Google's publicly released numbers? (Maybe yes? Maybe I'm out of the loop?) Chatgpt is sufficiently different than gpt3.5 that I think we'd need to rerun benchmarks to compare.

(And, of course, even if we did, there are open questions of potential data leakage--always a concern, but maybe an extra concern here, since it is unclear whether OpenAI would have prioritized that issue in chatgpt build out. Certainly would have been low on my list, personally.)

1

Paedor t1_j5ur6tx wrote

The trouble is that contrastive methods often compare elements from the same batch, instead of treating elements as independent like pretty much all other ML (except batchnorm).

As a simple example with a really weird version of contrastive learning: with a batch of 2N, contrastive learning might use the 4N^2 distances between batch elements to calculate a loss, while with two accumulated batches of N, contrastive learning could only use 2N^2 pairs for loss.

11

machine_learning7 t1_j5uoo02 wrote

2 strong R's, 1 weak accept. Probably a very low chance of success.

Can I add more references to the paper between now? I.e. I wish to answer some reviewers by saying I will add the references they want me to add and talk about the differences between my work and the reference.

1