Recent comments in /f/MachineLearning
Ch1nada OP t1_j6idoze wrote
Reply to comment by zachguo in [P] Automating a Youtube Shorts channel with Huggingface Transformers and After Effects by Ch1nada
Yep, precisely. It is sentiment analysis fine-tuned on finance news. This is then mapped to bullish, neutral or bearish for the analysis overlay
duck_mopsi t1_j6ibwq0 wrote
Reply to [D] Simple Questions Thread by AutoModerator
I am trying to create a GAN with RNNs. Therefore I'm trying to create stacked GRU-Cells which get fed the random input. I implemented it as follows:
def build_generator():
inputs = keras.Input(shape=[LATENT_SHAPE])
cell = keras.layers.StackedRNNCells([keras.layers.GRUCell(64, activation = 'tanh') for _ in range(7)])
rnn = keras.layers.RNN(cell, return_sequences=True)
x = rnn(inputs)
return keras.models.Model(inputs, x)
However everytime I try to call the method, I do get the following error:
I have found basically the same implementation for StackedRNNCells in the second to newest push from TimeGAN. Yet for me I get the error, I don't know how to fix.
zachguo t1_j6ibmvd wrote
Reply to [P] Automating a Youtube Shorts channel with Huggingface Transformers and After Effects by Ch1nada
Why do you need sentiment analysis? To categorize the news into "bullish" and "bearish"?
EmmyNoetherRing t1_j6i8xfv wrote
I hate to say it, but I think the actual answer to “as compared to what” is “as compared to my human professor”.
People using it to learn are having interactions that mimic interactions with teachers/experts. When they mention hallucinations, I think it’s often in that context.
synth_mania t1_j6i8qkk wrote
Reply to [D] DL university research PC suggestions? by seanrescs
Well you are sacrificing gpu virtualization afaik. Only enterprise cards get native support for that feature without hacks that may or may not work.
[deleted] t1_j6i4vna wrote
Reply to comment by [deleted] in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971
[removed]
grenouillefolle t1_j6i36hx wrote
Reply to [D] Simple Questions Thread by AutoModerator
I have a (seemingly) simple question concerning systematic studies for classification problems. Is there any literature (books, papers) describing an approach for systematic studies on classifiers, such as varying the size of the training sample, number of input variables, size of the correlation between input variables and classes on simulated data, type of classifier, configuration of parameters of the algorithm etc.?
The goal is to prove the robustness and limitations of the method before training on real data. While I have a good feeling of what can and should be done, I want to point a beginner in the right direction for a project without doing all the hard work myself.
LornartheBreton t1_j6i180q wrote
Please let us know when it's published so I can tell my university to buy some copies for its' library!
tysam_and_co OP t1_j6hxgzk wrote
Reply to comment by shellyturnwarm in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
Hi hi hiya there! Great questions, thanks so much for asking them! :D
For the dataloaders, that dataloading only happens once -- after that, it's just saved on disk as a tensor array in fp16. It's wayyyyy faster for experimentation this way. We only need to load the data once, then we move it to GPU, then we just dynamically slice it on the GPU each time! :D
As for self.se, that used to be a flag for the squeeze_and_excite layers. I think it's redundant now as it's just a default thing -- this is a one person show and I'm moving a lot of parts fast so there's oftentimes little extraneous bits and pieces hanging around. I'll try to clean that up on the next pass, very many thanks for pointing that out and asking!
I'm happy to answer any other questions that you might have! :D
tysam_and_co OP t1_j6hx45k wrote
Reply to comment by JamesBaxter_Horse in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
Thanks for sharing. I think you might be missing some of the bigger picture here! Most of the changes and performance improvements did indeed come by changing the architecture, memory format, execution order, network width/etc in the right places. These are from about five previous years of experience where my primary task was architecting networks like this. I actually transferred a number of personal lessons learned into this network to get a lot of the benefits that we have here. So I'm not quite sure why they would not scale to other problems all of a sudden! ;P I guess that said, there might be some tweaks in line in order to line up with the inductive biases of the network on different datasets (in this case, say, for Imagenet 1-2 more downscaling blocks or something like that).
I also wouldn't focus in on the hyperparameter twiddling that much -- though it is important and definitely can be a trap. At the front of being a world record, every option is on the table and hyperparameters promise results but are exponentially more expensive to work with. But the 'good enough' parameter space should be pretty flat outside of it, so it's likely not too bad of a starting place.
I'm a bit curious about how this would not be reproducible on another dataset (especially if we're narrowing our inductive space -- this should increase generalization, not reduce it!). Similar to Transformers, the simpler and more scalable this architecture is, the better. One of my go-tos for people newer to the field is to encourage them to keep things as simple as possible. It pays off!
In this case, for example, before release, I just added 70 epochs and doubled the base width, and went from 94.08% to 95.77%. That's a good sign! It should at least have good basic performance on other datasets, and if something has to be changed, it's probably just a few hyperparameters, and not all of them, if that makes sense.
shellyturnwarm t1_j6hw7yq wrote
In your dataloaders, why do you set persistent_workers to False. And why do you choose 2 for num_workers?
Also, what does self.se stand for in ConvGroup and what is it doing there?
Finally what is whitening, and what are you trying to achieve with it?
off99555 t1_j6hvsgn wrote
Reply to comment by JEFFREY_EPSTElN in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971
This model asks you to put instructions instead of two prompts describing the input and output images.
Dry-Tomatillo449 t1_j6htvqh wrote
Reply to [P] AI Content Detector by YoutubeStruggle
An AI Content Detector is a type of artificial intelligence software that is used to detect and analyze content from various sources such as images, audio, or video. It can be used to identify objects in images, detect text in audio and video recordings, or find relevant topics in documents. AI Content Detectors can be used to automate tasks such as content curation, content filtering, and content recommendation. It can be used to make decisions about what content should be included in a website, blog, or other online material. Additionally, AI Content Detectors can be used to identify and classify images, audio, video, and text in order to better understand the content and provide more relevant results.
EsEsMinnowjohnson t1_j6hrhaq wrote
Reply to [D] Remote PhD by TheRealMrMatt
Yes for MS with a bunch of schools that are dedicated to building online platforms (eg Oregon State) but really hard to find online PhD programs. Looks like NDSU offers one in CS, if you’re not too picky about prestige (go bison!).
I’m currently in Oregon State’s MS in Environmental Sciences program doing research on remote sensing and tree physiology (obviously not CS, but a useful enough anecdote). Here are the main things on an online program of this nature:
- you pretty much have to be self funded. Research and teaching assistantships aren’t unheard of but you’d likely be waiting a while (years) for an opportunity to open up.
- some institutions offer a choice between coursework only or a full thesis/dissertation. The former is easier and often doesn’t require a major advisor. The latter always does.
- one of the main reasons people are declined from the OSU program I’m in is their failure to secure a major advisor. This is probably true a lot of places. It’s hard to find faculty that are comfortable with taking on a remote student AND have data that already exist and can be processed remotely for a meaningful thesis. In my case my advisor has field sensors set up in a ponderosa pine stand and has 2 years of data that we’ll use to validate models based on existing Landsat data. That’s a great project because it requires no lab/field work or specialized equipment.
So far I’ve really enjoyed the OSU program, and I think we’ll continue to see a lot more of this in the future.
atanstark t1_j6hqvtb wrote
Reply to comment by Iffysituation in [D] Meta AI Residency 2023 by BeautyInUgly
Hey! I don't know but you might check for the OpenAI residency and see if it's still open for application. You can go to their career website to check.
[deleted] t1_j6hnq30 wrote
Reply to comment by TrumanCian in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971
[removed]
SatoshiNotMe t1_j6hmia3 wrote
Also subscribe to LabML trending papers newsletter. I like this because it’s based on papers trending on twitter, which means I don’t have to actually go doom-scrolling on twitter :)
Doriens1 t1_j6hkxl4 wrote
Reply to [D] Remote PhD by TheRealMrMatt
From my experience, I never saw a PhD program advertising remote work, but in practice, it heavily depends on the advisor and the research team.
I integrated two different teams (both from the same lab) during my PhD. I don't like remote work so much, so I basically came to the lab every day.
Regarding the first team, we were only two regulars coming to the lab, while all the other researchers were at home. However, almost everybody was there for the second team.
Now you might say that it was during the pandemic, but it remains true even today.
​
Usually, researchers are pretty open to discussion when you apply for PhD, so this is something you will have to discuss with them to know the habits of the team.
Ch1nada OP t1_j6hk9ae wrote
Reply to comment by _poisonedrationality in [P] Automating a Youtube Shorts channel with Huggingface Transformers and After Effects by Ch1nada
I think there is no clear response from their side, but from the guidelines one can infer that they'd cut it only if there's no added value (in this case the summary and analysis for instance) and mass produced, for example just chopping clips with random TTS from a tv show.
fmai t1_j6hjauf wrote
GPT-3 ranks relatively low on SuperGLUE because it was not finetuned on the SuperGLUE tasks, whereas T5, etc. were. The amazing feat about GPT-3 is that you can reach impressive performance with just few-shot prompting, which was unknown before.
As to your questions:
-
AFAIK, OpenAI hasn't published any numbers themselves and nobody outside of OpenAI has API access to ChatGPT yet, making it difficult to assess its performance on often thousands of examples from a benchmark. So, no, so far the performance improvement hasn't been quantified.
-
No, there is no quantitative analysis. Most people seem to agree that, anecdotally, ChatGPT seems to hallucinate far less than GPT-3. But you can definitely get ChatGPT to generate bullshit if you keep digging, so it's far from perfect. Depending on what story you want to tell, some people will emphasize one or the other. Take it all with a grain of salt until we get solid numbers.
-
AFAIK, LLMs are fantastic at closed-book question answering, where you're not allowed to look at external resources. I think a T5 based model was the first to show that it can answer trivia questions well from knowledge stored in the model parameters only. For open-book QA you will need to augment the LLM with some retrieval mechanism (which ChatGPT doesn't have yet), and therefore you can expect other models to be much better in this regard.
tectoniteshade t1_j6hj3ic wrote
Reply to [D] Simple Questions Thread by AutoModerator
While the amount and sophistication of AI tools has taken a sharp upward turn, there's one particular type of tool I tried to find but failed: one that would change the facial expression in a photograph or other still image. I found some toy-like phone apps with very limited sets. The best more professional tool I was able to find was Photoshop's neural filters. They were introduced already a couple of years ago, so one would think more advanced specialized tools for this purpose would exist already. Are there such tools? Did my google-fu just fail?
Jean-Porte t1_j6hif9e wrote
T5 is fine-tuned on supervised classification. Trained to output labels. That's why it outperforms GPT3.
Generative models are not as good as discriminative models for discriminative tasks. A carefully tuned Deberta is probably better than chatGPT. But ChatGPT has a user-friendly text interface. And the glue-type evaluation is not charitable to chatGPT capabilities. The model might internally store the answer but it could be misaligned to the benchmark.
I always wonder why we don't try to scale-up discriminative models. Deberta-xxlarge is "only" 1.3B parameters, and it outperforms T5 13B.
tysam_and_co OP t1_j6hhh19 wrote
Reply to comment by DisWastingMyTime in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
https://github.com/tysam-code/hlb-CIFAR10/releases
Hope this helps, feel free to let me know either way, many thanks! :D :))) <3 <3 <3 <3 :D :D :D :D :)
DisWastingMyTime t1_j6hh4f2 wrote
Is there anywhere I could see a summary of the decisions taken/changes made?
I saw you linked to the original paper that started this, I'll look into it, but I hope there's a more readable way to go over your experiments and insights than browsing the code.
Very interesting though, thanks for sharing!
colugo t1_j6ifbqy wrote
Reply to [D] DL university research PC suggestions? by seanrescs
Tim Dettmers has your answer.