Recent comments in /f/MachineLearning

BrotherAmazing t1_j4v2gm8 wrote

First, I’m blown away that you are suggesting that you don’t know your students and their writing styles, some of which are performed in-class and almost all of which differ significantly from the way ChatGPT writes, but second, my teachers said the exact same thing you are saying decades ago and freaked out when CliffsNotes came out!

Re-read my prior argument because nothing you just said impacts it, and it still stands.

2

starstruckmon t1_j4uufbc wrote

You don't really need a separate extension, do you? Your bot can just be another user submitting the timestamps.

Though it would help if the extension developer provided a list of videos that are being watched by their users but has no timestamps yet, so your bot isn't spending time scraping though unpopular videos.

1

Iljaaaa t1_j4uub0z wrote

I have an autoencoder input of 100x21. The 21 columns are PC scores, the 100 rows are observations. The importance of the columns degrades as the column number increases. The first column is the most important for the data variance, the last column is the least important. To be able to reconstruct the data back from PCA the first columns need to be as correct as possible.

I have tried searching whether I can adjust weights or something else of the autoencoder layers to include this importance of the columns, but I have not found it.

In other words, I want errors in the first (e.g 5) columns to be punished more harshly than errors in the last (e.g 5) columns.

I would be grateful if someone could point me in the right direction!

2

iqisoverrated t1_j4ur0re wrote

...or...you could just get the youtube adblock/sponsorblock skip extension (dunno exactly what it's called SkipAdTrigger or something? I cannot check my home machine at the moment...but it's available for Firefox and I'm pretty sure something similar must exist for other browsers as well).

Works well in my experience. It automatically skips sponsorblocks and marks them as green on the time bar (so you can manually watch them if you're into that kinda thing. Hey, there's all kinds of kinks out there. Don't judge.)

1

float16 t1_j4uqlls wrote

2 seems doable. Not everybody has to have a GPU, but I bet lots of people, including me, would rather spin up the GPU in their personal computer for a few seconds than manually specify where skippable segments are.

The one central server thing bugs me. I'd prefer something like "query your nearest neighbors and choose the one with the most recent data." No idea how to do that though; not a systems person.

1

monkeysingmonkeynew OP t1_j4un2xm wrote

OK I can almost see this working, thanks for the suggestion. The only thing that would prevent me from implementing this solution is that by taking the sum of the two models, it would let m_1 give as equal a contribution to the result as m_1. However I expect a single days data to be noisy, Thus I would need the contribution of the new days data to be down weighted somehow.

1

MrSpotgold OP t1_j4umjvh wrote

ChatGPT is beyond cheating. We have to go on the default that it is applied in essay writing. And surely you agree that it is pointless to assess the output of a machine. Therefore, essay writing will cease to be a method of assessment, and consequently, whichever way you look at it, future students will no longer learn to write.

0

Philpax t1_j4uk4ws wrote

Honestly, I'm not convinced it needs a hugely complex language model, as (to me) it seems like a primarily classification task, and not one that would need a deep level of understanding. It'd be a level or two above standard spam filters, maybe?

The two primary NN-in-web solutions I'm aware of are tf.js and ONNX Runtime Web, both of which do CPU inference, but the latter is developing some GPU inference. As you say, it only needs to be done once, so having a button that scans through the transcript and classifies sentence probabilities as sponsor-read or not, and then automatically selects the boundaries of the probabilities seems readily doable. Even if it takes some noticeable amount of time for the user, it's pretty quickly amortised across the entire viewing population.

The only real concern I'd have at that point is... is it worth it for the average user over just hitting the right arrow two times and/or manually submitting the timestamps themselves? I suspect that's why it hasn't been done yet

2

FastestLearner OP t1_j4uj5oy wrote

I don't have much experience of the cost of training NLP models (I work mostly in Vision). But I think if you can get a product out with just enough accuracy to get the heads turning in your favour, you could always scale up the model later down the road. Alternatively, you could have donate button on the extension's settings page (which many extensions do), if you do get some donations you could use it to update the model later on. It could be crowd-sourced and crowd-funded simultaneously.

1