rajanjedi t1_jbt625q wrote on March 11, 2023 at 2:53 PM

Reply to comment by Simusid in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Number of tokens in the input perhaps?

VarietyElderberry t1_jbt5zkd wrote on March 11, 2023 at 2:53 PM

Reply to comment by Simusid in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

I'm assuming u/imaginethezmell is referring to the context length. Indeed, if there is a need for longer context lengths, then OpenAI outcompetes SentenceTransformer which has a default context length of 128.

xt-89 t1_jbt5yyd wrote on March 11, 2023 at 2:53 PM

Reply to comment by NovelspaceOnly in [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

That’s cool. I assume you’re going to apply this to memories for the agent. There’s already relevant research on how to do that. Here’s one from Facebookresearch: https://ai.facebook.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/

AerysSk t1_jbt5j4l wrote on March 11, 2023 at 2:49 PM

Reply to [D] Is Pytorch Lightning + Wandb a good combination for research? by gokulPRO

My main problem is Lightning itself. I don’t find the flexibility of it like pytorch. Tried migrate an old code, gave up along the way, and still using pytorch now.

jobeta t1_jbt54u8 wrote on March 11, 2023 at 2:46 PM

Reply to [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

What are we looking at though? t-SNE?

Simusid OP t1_jbt4y5s wrote on March 11, 2023 at 2:45 PM

Reply to comment by krishnakumar3096 in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

I was lazy and used the model they show in their code example found here https://platform.openai.com/docs/guides/embeddings/what-are-embeddings.

Also on that page, they show that Ada outperform Davinci (BEIR score) and is cheaper to use.

Avelina9X t1_jbt4o8y wrote on March 11, 2023 at 2:43 PM

Reply to [D] What's the Time and Space Complexity of Transformer Models Inference? by Smooth-Earth-9897

So the attention mechanism has N^2 space and time complexity relative to sequence length. However, if you are memory constrained it is possible to get the memory requirement per token down to O(N) by computing only 1 token at a time and caching the previous keys and values. This is only really possible at inference time and requires the architecture was implemented with caching in mind.

krishnakumar3096 t1_jbt3nk4 wrote on March 11, 2023 at 2:35 PM

Reply to [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Why not try Davinci instead?? Why is it on ada??

eclipsejki t1_jbt3ivm wrote on March 11, 2023 at 2:34 PM

Reply to [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

si if I understood it well, this lib "explains" git repo. Am I right?

jsonathan t1_jbt3hqq wrote on March 11, 2023 at 2:34 PM

Reply to [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

This is really fascinating, thanks for sharing. I'm also working on generating natural language representations of Python packages. My approach is:

Extract a call graph from the package, where each node is a function and two nodes are connected if one contains a call to the other.
Generate natural language summaries of each function by convolving over the graph. This involves generating summaries of the terminal nodes (i.e. functions with no dependencies), then passing those summaries to their dependents to generate summaries, and so on. Very similar to how message passing works in a GNN. The idea here is that summarizing what a function does isn't possible without summaries of what its dependencies do.
Summaries of each function within a file are chained to generate a summary of that file.
Summaries of each file within a directory are chained to generate a summary of that directory, and so on until the root directory is reached.

I'd love to learn more about the differences/advantages of your approach compared to something like this. Thanks again for your contribution, this is insanely cool!

hak8or t1_jbt1wja wrote on March 11, 2023 at 2:21 PM

Reply to comment by [deleted] in [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

/u/NovelspaceOnly Can you verify this?

As to /u/Main_Mathematician77 , you are effectively a software developer with the ability to dabble with machine learning. Are you located in the states or elsewhere? It would be very confusing as to how you are broke yet have that skillset.

> Don’t @ me saying this is a waste of compute, I know what I’m doing and idgaf.

That is extremely unnecessarily antagonistic/combative

[deleted] t1_jbt1bch wrote on March 11, 2023 at 2:16 PM

Reply to [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

[deleted]

Simusid OP t1_jbt13iy wrote on March 11, 2023 at 2:14 PM

Reply to comment by imaginethezmell in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

8K? I'm not sure what you're referring to.

imaginethezmell t1_jbszsey wrote on March 11, 2023 at 2:03 PM

Reply to comment by Simusid in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

openai is 8k

how about sentence transformer

Simusid OP t1_jbsyp5n wrote on March 11, 2023 at 1:54 PM

Reply to [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Yesterday I set up a paid account at OpenAI. I have been using the free sentence-transformers library and models for many months with good results. I compared the performance of the two by encoding 20K vectors from this repo https://github.com/mhjabreel/CharCnn_Keras. I did no preprocessing or cleanup of the input text. The OpenAI model is text-embedding-ada-002 and the SentenceTransformer model is all-mpnet-base-v2. The plots are simple UMAP(), with all defaults.I also built a very generic model with 3 dense layers, nothing fancy. I ran each model ten times for the two embeddings, fitting with EarlyStopping, and evaluating with hold out data. The average results were HF 89% and OpenAI 91.1%. This is not rigorous or conclusive, but for my purposes I'm happy sticking with SentenceTransformers. If I need to chase decimal points of performance, I will use OpenAi.

Edit - The second graph should be titled "SentenceTransformer" not HuggingFace.

rainbow3 t1_jbsvc41 wrote on March 11, 2023 at 1:23 PM

Reply to comment by MegavirusOfDoom in [D] Development challenges of an autonomous gardening robot using object detection and mapping. by [deleted]

It is a bit like the daleks. Created by a mastermind but he forgot about stairs. Tall with cameras will get stuck under low hanging fruit trees. Ultrasound fine for walls but flower beds less so.

There is a lot of unknowns even for one application.

Another example..I had one with a rain sensor so it avoided cutting wet grass. Sounds good but if there are weeks of rain then the grass does not get cut until gets too long for the mower to cut.

rainbow3 t1_jbsugc8 wrote on March 11, 2023 at 1:15 PM

Reply to comment by currentscurrents in [D] Development challenges of an autonomous gardening robot using object detection and mapping. by [deleted]

Lawn robots work inside a border wire. They change direction randomly at the border. No machine learning but in practice it works really well. You can over think things.

ri212 t1_jbsohhm wrote on March 11, 2023 at 12:10 PM

Reply to [D] Is Pytorch Lightning + Wandb a good combination for research? by gokulPRO

I can't say for sure whether it is the best combination for research in the long run, but if you do go down that route I have found this template very useful

_rundown_ t1_jbsiy45 wrote on March 11, 2023 at 10:57 AM

Reply to [D] Development challenges of an autonomous gardening robot using object detection and mapping. by [deleted]

Open source?

LikeForeheadBut t1_jbshnil wrote on March 11, 2023 at 10:39 AM

Reply to comment by [deleted] in [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

If you’re that broke, aren’t there better uses of your time than this project lol

NovelspaceOnly OP t1_jbsdr55 wrote on March 11, 2023 at 9:42 AM

Reply to comment by [deleted] in [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

I have some preliminary generation scripts for SMILES chemical graphs, Feynman diagrams, storytelling with interleaved images, and testing compilation rates. sorry for switching accounts. this one is logged on my laptop lol..

science-raven OP t1_jbsbbyz wrote on March 11, 2023 at 9:07 AM

Reply to comment by MrTacobeans in [D] Development challenges of an autonomous gardening robot using object detection and mapping. by science-raven

Nvidia Jetson Nano and Raspberry Pi can run 2 FPS of AI object detection, on Yolo NN code. A Yolo model can differentiate 80 different objects, and you can run 20-50 different Yolo models, to detect 10,000 different objects.

The traditional programming copies the AI identification objects to a 3D map of the zone.

[deleted] t1_jbsama8 wrote on March 11, 2023 at 8:57 AM

Reply to comment by xt-89 in [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

Yes exactly! That’s the a major goal of this project. I plan on incorporating the inference server that Yannic set up for open assistant.

[deleted] t1_jbsafxc wrote on March 11, 2023 at 8:54 AM

Reply to comment by xt-89 in [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

Thank you! Yes I thought the topic tree would be a great complement to the commit tree. Would be great for stale repos with little to no documentation.

Also the option to mix in multiple repositories and message pass between them to help with brain storming new features. Or message passing between your repo and its dependencies.

xt-89 t1_jbsaabf wrote on March 11, 2023 at 8:52 AM

Reply to comment by [deleted] in [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling. by NovelspaceOnly

I also plan on applying the basic idea of a GNN with prompting to the thought loop of an cognitive entity (basically open assistant). I believe if you take the tree your outputting for code, but use it to aid CoT reasoning, that could be pretty powerful

Recent comments in /f/MachineLearning