Recent comments in /f/MachineLearning

duck_mopsi t1_j6ibwq0 wrote

I am trying to create a GAN with RNNs. Therefore I'm trying to create stacked GRU-Cells which get fed the random input. I implemented it as follows:

def build_generator():
    inputs = keras.Input(shape=[LATENT_SHAPE])
    cell = keras.layers.StackedRNNCells([keras.layers.GRUCell(64, activation = 'tanh') for _ in range(7)])
    rnn = keras.layers.RNN(cell, return_sequences=True)
    x = rnn(inputs)
    return keras.models.Model(inputs, x)

However everytime I try to call the method, I do get the following error:

Error

I have found basically the same implementation for StackedRNNCells in the second to newest push from TimeGAN. Yet for me I get the error, I don't know how to fix.

1

EmmyNoetherRing t1_j6i8xfv wrote

I hate to say it, but I think the actual answer to “as compared to what” is “as compared to my human professor”.

People using it to learn are having interactions that mimic interactions with teachers/experts. When they mention hallucinations, I think it’s often in that context.

4

grenouillefolle t1_j6i36hx wrote

I have a (seemingly) simple question concerning systematic studies for classification problems. Is there any literature (books, papers) describing an approach for systematic studies on classifiers, such as varying the size of the training sample, number of input variables, size of the correlation between input variables and classes on simulated data, type of classifier, configuration of parameters of the algorithm etc.?

The goal is to prove the robustness and limitations of the method before training on real data. While I have a good feeling of what can and should be done, I want to point a beginner in the right direction for a project without doing all the hard work myself.

2

tysam_and_co OP t1_j6hxgzk wrote

Hi hi hiya there! Great questions, thanks so much for asking them! :D

For the dataloaders, that dataloading only happens once -- after that, it's just saved on disk as a tensor array in fp16. It's wayyyyy faster for experimentation this way. We only need to load the data once, then we move it to GPU, then we just dynamically slice it on the GPU each time! :D

As for self.se, that used to be a flag for the squeeze_and_excite layers. I think it's redundant now as it's just a default thing -- this is a one person show and I'm moving a lot of parts fast so there's oftentimes little extraneous bits and pieces hanging around. I'll try to clean that up on the next pass, very many thanks for pointing that out and asking!

I'm happy to answer any other questions that you might have! :D

1

tysam_and_co OP t1_j6hx45k wrote

Thanks for sharing. I think you might be missing some of the bigger picture here! Most of the changes and performance improvements did indeed come by changing the architecture, memory format, execution order, network width/etc in the right places. These are from about five previous years of experience where my primary task was architecting networks like this. I actually transferred a number of personal lessons learned into this network to get a lot of the benefits that we have here. So I'm not quite sure why they would not scale to other problems all of a sudden! ;P I guess that said, there might be some tweaks in line in order to line up with the inductive biases of the network on different datasets (in this case, say, for Imagenet 1-2 more downscaling blocks or something like that).

I also wouldn't focus in on the hyperparameter twiddling that much -- though it is important and definitely can be a trap. At the front of being a world record, every option is on the table and hyperparameters promise results but are exponentially more expensive to work with. But the 'good enough' parameter space should be pretty flat outside of it, so it's likely not too bad of a starting place.

I'm a bit curious about how this would not be reproducible on another dataset (especially if we're narrowing our inductive space -- this should increase generalization, not reduce it!). Similar to Transformers, the simpler and more scalable this architecture is, the better. One of my go-tos for people newer to the field is to encourage them to keep things as simple as possible. It pays off!

In this case, for example, before release, I just added 70 epochs and doubled the base width, and went from 94.08% to 95.77%. That's a good sign! It should at least have good basic performance on other datasets, and if something has to be changed, it's probably just a few hyperparameters, and not all of them, if that makes sense.

2

Dry-Tomatillo449 t1_j6htvqh wrote

An AI Content Detector is a type of artificial intelligence software that is used to detect and analyze content from various sources such as images, audio, or video. It can be used to identify objects in images, detect text in audio and video recordings, or find relevant topics in documents. AI Content Detectors can be used to automate tasks such as content curation, content filtering, and content recommendation. It can be used to make decisions about what content should be included in a website, blog, or other online material. Additionally, AI Content Detectors can be used to identify and classify images, audio, video, and text in order to better understand the content and provide more relevant results.

1

EsEsMinnowjohnson t1_j6hrhaq wrote

Yes for MS with a bunch of schools that are dedicated to building online platforms (eg Oregon State) but really hard to find online PhD programs. Looks like NDSU offers one in CS, if you’re not too picky about prestige (go bison!).

I’m currently in Oregon State’s MS in Environmental Sciences program doing research on remote sensing and tree physiology (obviously not CS, but a useful enough anecdote). Here are the main things on an online program of this nature:

  • you pretty much have to be self funded. Research and teaching assistantships aren’t unheard of but you’d likely be waiting a while (years) for an opportunity to open up.
  • some institutions offer a choice between coursework only or a full thesis/dissertation. The former is easier and often doesn’t require a major advisor. The latter always does.
  • one of the main reasons people are declined from the OSU program I’m in is their failure to secure a major advisor. This is probably true a lot of places. It’s hard to find faculty that are comfortable with taking on a remote student AND have data that already exist and can be processed remotely for a meaningful thesis. In my case my advisor has field sensors set up in a ponderosa pine stand and has 2 years of data that we’ll use to validate models based on existing Landsat data. That’s a great project because it requires no lab/field work or specialized equipment.

So far I’ve really enjoyed the OSU program, and I think we’ll continue to see a lot more of this in the future.

2

Doriens1 t1_j6hkxl4 wrote

From my experience, I never saw a PhD program advertising remote work, but in practice, it heavily depends on the advisor and the research team.

I integrated two different teams (both from the same lab) during my PhD. I don't like remote work so much, so I basically came to the lab every day.

Regarding the first team, we were only two regulars coming to the lab, while all the other researchers were at home. However, almost everybody was there for the second team.

Now you might say that it was during the pandemic, but it remains true even today.

​

Usually, researchers are pretty open to discussion when you apply for PhD, so this is something you will have to discuss with them to know the habits of the team.

1

fmai t1_j6hjauf wrote

GPT-3 ranks relatively low on SuperGLUE because it was not finetuned on the SuperGLUE tasks, whereas T5, etc. were. The amazing feat about GPT-3 is that you can reach impressive performance with just few-shot prompting, which was unknown before.

As to your questions:

  1. AFAIK, OpenAI hasn't published any numbers themselves and nobody outside of OpenAI has API access to ChatGPT yet, making it difficult to assess its performance on often thousands of examples from a benchmark. So, no, so far the performance improvement hasn't been quantified.

  2. No, there is no quantitative analysis. Most people seem to agree that, anecdotally, ChatGPT seems to hallucinate far less than GPT-3. But you can definitely get ChatGPT to generate bullshit if you keep digging, so it's far from perfect. Depending on what story you want to tell, some people will emphasize one or the other. Take it all with a grain of salt until we get solid numbers.

  3. AFAIK, LLMs are fantastic at closed-book question answering, where you're not allowed to look at external resources. I think a T5 based model was the first to show that it can answer trivia questions well from knowledge stored in the model parameters only. For open-book QA you will need to augment the LLM with some retrieval mechanism (which ChatGPT doesn't have yet), and therefore you can expect other models to be much better in this regard.

9

tectoniteshade t1_j6hj3ic wrote

While the amount and sophistication of AI tools has taken a sharp upward turn, there's one particular type of tool I tried to find but failed: one that would change the facial expression in a photograph or other still image. I found some toy-like phone apps with very limited sets. The best more professional tool I was able to find was Photoshop's neural filters. They were introduced already a couple of years ago, so one would think more advanced specialized tools for this purpose would exist already. Are there such tools? Did my google-fu just fail?

1

Jean-Porte t1_j6hif9e wrote

T5 is fine-tuned on supervised classification. Trained to output labels. That's why it outperforms GPT3.

Generative models are not as good as discriminative models for discriminative tasks. A carefully tuned Deberta is probably better than chatGPT. But ChatGPT has a user-friendly text interface. And the glue-type evaluation is not charitable to chatGPT capabilities. The model might internally store the answer but it could be misaligned to the benchmark.

I always wonder why we don't try to scale-up discriminative models. Deberta-xxlarge is "only" 1.3B parameters, and it outperforms T5 13B.

17