colugo t1_j6ifbqy wrote on January 30, 2023 at 3:46 PM

Reply to [D] DL university research PC suggestions? by seanrescs

Ch1nada OP t1_j6idoze wrote on January 30, 2023 at 3:35 PM

Reply to comment by zachguo in [P] Automating a Youtube Shorts channel with Huggingface Transformers and After Effects by Ch1nada

Yep, precisely. It is sentiment analysis fine-tuned on finance news. This is then mapped to bullish, neutral or bearish for the analysis overlay

duck_mopsi t1_j6ibwq0 wrote on January 30, 2023 at 3:23 PM

Reply to [D] Simple Questions Thread by AutoModerator

I am trying to create a GAN with RNNs. Therefore I'm trying to create stacked GRU-Cells which get fed the random input. I implemented it as follows:

def build_generator():
    inputs = keras.Input(shape=[LATENT_SHAPE])
    cell = keras.layers.StackedRNNCells([keras.layers.GRUCell(64, activation = 'tanh') for _ in range(7)])
    rnn = keras.layers.RNN(cell, return_sequences=True)
    x = rnn(inputs)
    return keras.models.Model(inputs, x)

However everytime I try to call the method, I do get the following error:

Error

I have found basically the same implementation for StackedRNNCells in the second to newest push from TimeGAN. Yet for me I get the error, I don't know how to fix.

zachguo t1_j6ibmvd wrote on January 30, 2023 at 3:21 PM

Reply to [P] Automating a Youtube Shorts channel with Huggingface Transformers and After Effects by Ch1nada

Why do you need sentiment analysis? To categorize the news into "bullish" and "bearish"?

EmmyNoetherRing t1_j6i8xfv wrote on January 30, 2023 at 3:02 PM

Reply to [Discussion] ChatGPT and language understanding benchmarks by mettle

I hate to say it, but I think the actual answer to “as compared to what” is “as compared to my human professor”.

People using it to learn are having interactions that mimic interactions with teachers/experts. When they mention hallucinations, I think it’s often in that context.

synth_mania t1_j6i8qkk wrote on January 30, 2023 at 3:01 PM

Reply to [D] DL university research PC suggestions? by seanrescs

Well you are sacrificing gpu virtualization afaik. Only enterprise cards get native support for that feature without hacks that may or may not work.

[deleted] t1_j6i4vna wrote on January 30, 2023 at 2:33 PM

Reply to comment by [deleted] in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971

[removed]

grenouillefolle t1_j6i36hx wrote on January 30, 2023 at 2:20 PM

Reply to [D] Simple Questions Thread by AutoModerator

I have a (seemingly) simple question concerning systematic studies for classification problems. Is there any literature (books, papers) describing an approach for systematic studies on classifiers, such as varying the size of the training sample, number of input variables, size of the correlation between input variables and classes on simulated data, type of classifier, configuration of parameters of the algorithm etc.?

The goal is to prove the robustness and limitations of the method before training on real data. While I have a good feeling of what can and should be done, I want to point a beginner in the right direction for a project without doing all the hard work myself.

LornartheBreton t1_j6i180q wrote on January 30, 2023 at 2:05 PM

Reply to [P] New textbook: Understanding Deep Learning by SimonJDPrince

Please let us know when it's published so I can tell my university to buy some copies for its' library!

tysam_and_co OP t1_j6hxgzk wrote on January 30, 2023 at 1:34 PM

Reply to comment by shellyturnwarm in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

Hi hi hiya there! Great questions, thanks so much for asking them! :D

For the dataloaders, that dataloading only happens once -- after that, it's just saved on disk as a tensor array in fp16. It's wayyyyy faster for experimentation this way. We only need to load the data once, then we move it to GPU, then we just dynamically slice it on the GPU each time! :D

As for self.se, that used to be a flag for the squeeze_and_excite layers. I think it's redundant now as it's just a default thing -- this is a one person show and I'm moving a lot of parts fast so there's oftentimes little extraneous bits and pieces hanging around. I'll try to clean that up on the next pass, very many thanks for pointing that out and asking!

I'm happy to answer any other questions that you might have! :D

tysam_and_co OP t1_j6hx45k wrote on January 30, 2023 at 1:31 PM

Reply to comment by JamesBaxter_Horse in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

Thanks for sharing. I think you might be missing some of the bigger picture here! Most of the changes and performance improvements did indeed come by changing the architecture, memory format, execution order, network width/etc in the right places. These are from about five previous years of experience where my primary task was architecting networks like this. I actually transferred a number of personal lessons learned into this network to get a lot of the benefits that we have here. So I'm not quite sure why they would not scale to other problems all of a sudden! ;P I guess that said, there might be some tweaks in line in order to line up with the inductive biases of the network on different datasets (in this case, say, for Imagenet 1-2 more downscaling blocks or something like that).

I also wouldn't focus in on the hyperparameter twiddling that much -- though it is important and definitely can be a trap. At the front of being a world record, every option is on the table and hyperparameters promise results but are exponentially more expensive to work with. But the 'good enough' parameter space should be pretty flat outside of it, so it's likely not too bad of a starting place.

I'm a bit curious about how this would not be reproducible on another dataset (especially if we're narrowing our inductive space -- this should increase generalization, not reduce it!). Similar to Transformers, the simpler and more scalable this architecture is, the better. One of my go-tos for people newer to the field is to encourage them to keep things as simple as possible. It pays off!

In this case, for example, before release, I just added 70 epochs and doubled the base width, and went from 94.08% to 95.77%. That's a good sign! It should at least have good basic performance on other datasets, and if something has to be changed, it's probably just a few hyperparameters, and not all of them, if that makes sense.

shellyturnwarm t1_j6hw7yq wrote on January 30, 2023 at 1:23 PM

Reply to [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

In your dataloaders, why do you set persistent_workers to False. And why do you choose 2 for num_workers?

Also, what does self.se stand for in ConvGroup and what is it doing there?

Finally what is whitening, and what are you trying to achieve with it?

off99555 t1_j6hvsgn wrote on January 30, 2023 at 1:19 PM

Reply to comment by JEFFREY_EPSTElN in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971

This model asks you to put instructions instead of two prompts describing the input and output images.

Dry-Tomatillo449 t1_j6htvqh wrote on January 30, 2023 at 1:01 PM

Reply to [P] AI Content Detector by YoutubeStruggle

An AI Content Detector is a type of artificial intelligence software that is used to detect and analyze content from various sources such as images, audio, or video. It can be used to identify objects in images, detect text in audio and video recordings, or find relevant topics in documents. AI Content Detectors can be used to automate tasks such as content curation, content filtering, and content recommendation. It can be used to make decisions about what content should be included in a website, blog, or other online material. Additionally, AI Content Detectors can be used to identify and classify images, audio, video, and text in order to better understand the content and provide more relevant results.

EsEsMinnowjohnson t1_j6hrhaq wrote on January 30, 2023 at 12:38 PM

Reply to [D] Remote PhD by TheRealMrMatt

Yes for MS with a bunch of schools that are dedicated to building online platforms (eg Oregon State) but really hard to find online PhD programs. Looks like NDSU offers one in CS, if you’re not too picky about prestige (go bison!).

I’m currently in Oregon State’s MS in Environmental Sciences program doing research on remote sensing and tree physiology (obviously not CS, but a useful enough anecdote). Here are the main things on an online program of this nature:

you pretty much have to be self funded. Research and teaching assistantships aren’t unheard of but you’d likely be waiting a while (years) for an opportunity to open up.
some institutions offer a choice between coursework only or a full thesis/dissertation. The former is easier and often doesn’t require a major advisor. The latter always does.
one of the main reasons people are declined from the OSU program I’m in is their failure to secure a major advisor. This is probably true a lot of places. It’s hard to find faculty that are comfortable with taking on a remote student AND have data that already exist and can be processed remotely for a meaningful thesis. In my case my advisor has field sensors set up in a ponderosa pine stand and has 2 years of data that we’ll use to validate models based on existing Landsat data. That’s a great project because it requires no lab/field work or specialized equipment.

So far I’ve really enjoyed the OSU program, and I think we’ll continue to see a lot more of this in the future.

atanstark t1_j6hqvtb wrote on January 30, 2023 at 12:32 PM

Reply to comment by Iffysituation in [D] Meta AI Residency 2023 by BeautyInUgly

Hey! I don't know but you might check for the OpenAI residency and see if it's still open for application. You can go to their career website to check.

[deleted] t1_j6hnq30 wrote on January 30, 2023 at 11:57 AM

Reply to comment by TrumanCian in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971

[removed]

SatoshiNotMe t1_j6hmia3 wrote on January 30, 2023 at 11:43 AM

Reply to [D] How do people keep up with ML news that is not NLP related? by shaner92

Also subscribe to LabML trending papers newsletter. I like this because it’s based on papers trending on twitter, which means I don’t have to actually go doom-scrolling on twitter :)

Doriens1 t1_j6hkxl4 wrote on January 30, 2023 at 11:23 AM

Reply to [D] Remote PhD by TheRealMrMatt

From my experience, I never saw a PhD program advertising remote work, but in practice, it heavily depends on the advisor and the research team.

I integrated two different teams (both from the same lab) during my PhD. I don't like remote work so much, so I basically came to the lab every day.

Regarding the first team, we were only two regulars coming to the lab, while all the other researchers were at home. However, almost everybody was there for the second team.

Now you might say that it was during the pandemic, but it remains true even today.

Usually, researchers are pretty open to discussion when you apply for PhD, so this is something you will have to discuss with them to know the habits of the team.

Ch1nada OP t1_j6hk9ae wrote on January 30, 2023 at 11:15 AM

Reply to comment by _poisonedrationality in [P] Automating a Youtube Shorts channel with Huggingface Transformers and After Effects by Ch1nada

I think there is no clear response from their side, but from the guidelines one can infer that they'd cut it only if there's no added value (in this case the summary and analysis for instance) and mass produced, for example just chopping clips with random TTS from a tv show.

fmai t1_j6hjauf wrote on January 30, 2023 at 11:02 AM

Reply to [Discussion] ChatGPT and language understanding benchmarks by mettle

GPT-3 ranks relatively low on SuperGLUE because it was not finetuned on the SuperGLUE tasks, whereas T5, etc. were. The amazing feat about GPT-3 is that you can reach impressive performance with just few-shot prompting, which was unknown before.

As to your questions:

AFAIK, OpenAI hasn't published any numbers themselves and nobody outside of OpenAI has API access to ChatGPT yet, making it difficult to assess its performance on often thousands of examples from a benchmark. So, no, so far the performance improvement hasn't been quantified.
No, there is no quantitative analysis. Most people seem to agree that, anecdotally, ChatGPT seems to hallucinate far less than GPT-3. But you can definitely get ChatGPT to generate bullshit if you keep digging, so it's far from perfect. Depending on what story you want to tell, some people will emphasize one or the other. Take it all with a grain of salt until we get solid numbers.
AFAIK, LLMs are fantastic at closed-book question answering, where you're not allowed to look at external resources. I think a T5 based model was the first to show that it can answer trivia questions well from knowledge stored in the model parameters only. For open-book QA you will need to augment the LLM with some retrieval mechanism (which ChatGPT doesn't have yet), and therefore you can expect other models to be much better in this regard.

tectoniteshade t1_j6hj3ic wrote on January 30, 2023 at 10:59 AM

Reply to [D] Simple Questions Thread by AutoModerator

While the amount and sophistication of AI tools has taken a sharp upward turn, there's one particular type of tool I tried to find but failed: one that would change the facial expression in a photograph or other still image. I found some toy-like phone apps with very limited sets. The best more professional tool I was able to find was Photoshop's neural filters. They were introduced already a couple of years ago, so one would think more advanced specialized tools for this purpose would exist already. Are there such tools? Did my google-fu just fail?

Jean-Porte t1_j6hif9e wrote on January 30, 2023 at 10:50 AM

Reply to [Discussion] ChatGPT and language understanding benchmarks by mettle

T5 is fine-tuned on supervised classification. Trained to output labels. That's why it outperforms GPT3.

Generative models are not as good as discriminative models for discriminative tasks. A carefully tuned Deberta is probably better than chatGPT. But ChatGPT has a user-friendly text interface. And the glue-type evaluation is not charitable to chatGPT capabilities. The model might internally store the answer but it could be misaligned to the benchmark.

I always wonder why we don't try to scale-up discriminative models. Deberta-xxlarge is "only" 1.3B parameters, and it outperforms T5 13B.

tysam_and_co OP t1_j6hhh19 wrote on January 30, 2023 at 10:37 AM

Reply to comment by DisWastingMyTime in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

https://github.com/tysam-code/hlb-CIFAR10/releases

Hope this helps, feel free to let me know either way, many thanks! :D :))) <3 <3 <3 <3 :D :D :D :D :)

DisWastingMyTime t1_j6hh4f2 wrote on January 30, 2023 at 10:32 AM

Reply to [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

Is there anywhere I could see a summary of the decisions taken/changes made?

I saw you linked to the original paper that started this, I'll look into it, but I hope there's a more readable way to go over your experiments and insights than browsing the code.

Very interesting though, thanks for sharing!

Recent comments in /f/MachineLearning