vzq t1_j88vobl wrote on February 12, 2023 at 3:05 PM

Reply to [R] DIGIFACE-1M — synthetic dataset with one million images for face recognition by t0ns0fph0t0ns

Big Guess Who energy!

AsIAm t1_j88u9rb wrote on February 12, 2023 at 2:54 PM

Reply to comment by __lawless in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

Me too, but it was Google project, and what Google does to its projects..?

I think relying on TF would have been a mistake. This deep integration approach will be more fruitful in the long run. Also, if anybody wants to do ML in Swift on Apple platforms, there is awesome MPS Graph.

[deleted] OP t1_j88tkgy wrote on February 12, 2023 at 2:49 PM

Reply to [D] The (minor/major?) flaw in the philosophy of OpenAI's ChatGPT purpose by [deleted]

To be clear I don't think it has answers about aliens. That is not the point of the post and I don't want it to be a distraction.

The point is it lies in the public implementation about its abilities as an LLM, and has a very normal and safe answer in the API implementation indicating there is actually no obvious issue with imagining these scenarios.

What is the motivation behind this from an implementation perspective? Why has OpenAI decided it's not capable of this, when it is?

__lawless t1_j88tevq wrote on February 12, 2023 at 2:48 PM

Reply to comment by AsIAm in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

Very true. It’s just a shame that it did not make it. I was very looking forward to it

0x00A0C0 t1_j88td3a wrote on February 12, 2023 at 2:47 PM

Reply to [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

Not really an answer to your question, but there are Python packages that try to solve the problem of tensor shapes that you mentioned, e.g. https://github.com/patrick-kidger/torchtyping or https://github.com/deepmind/tensor_annotations

t0ns0fph0t0ns OP t1_j88sq1x wrote on February 12, 2023 at 2:42 PM

Reply to [R] DIGIFACE-1M — synthetic dataset with one million images for face recognition by t0ns0fph0t0ns

>State-of-the-art face recognition models show impressive accuracy, achieving over 99.8% on Labeled Faces in the Wild (LFW) dataset. However, these models are trained on large-scale datasets that contain millions of real human face images collected from the internet. Web-crawled face images are severely biased (in terms of race, lighting, make-up, etc) and often contain labeling noise. Most importantly, these face images are collected without explicit consent, raising more pressing privacy and ethical concerns. To avoid the problems associated with real face datasets, we introduce a large-scale synthetic dataset for face recognition, obtained by photo-realistic rendering of diverse and high-quality digital faces using a computer graphics pipeline. We compare our method to SynFace, a recent method trained on GAN-generated synthetic faces, and reduce the error rate on LFW by 52.5% (accuracy from 91.93% to 96.17%). We first demonstrate that aggressive data augmentation can significantly help reduce the domain-gap between our synthetic faces and real face images. Taking advantage of having full control over the rendering pipeline, we also study how each attribute (e.g., variation in facial pose, accessories, and textures) affects the accuracy. Finally, by fine-tuning the network on a smaller number of real face images that could reasonably be obtained with consent, we achieve accuracy that is comparable to the methods trained on millions of real face images, while alleviating the problems associated with large datasets. microsoft.github.io
>
>video presentation: youtube.com
>
>paper: arxiv.org

AsIAm t1_j88rlv3 wrote on February 12, 2023 at 2:33 PM

Reply to comment by __lawless in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

While S4TF died, the idea (autodiff in the lang) still lives and is slowly and quietly being worked on: https://github.com/apple/swift/issues?q=is%3Aissue+is%3Aopen+autodiff

AsIAm t1_j88rd2i wrote on February 12, 2023 at 2:31 PM

Reply to comment by Calm_Motor4162 in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

While TF.js is performant and godsend, it's ugly because JS lacks operator overloading and native tensor type, so you have to do tf.add(tf.tensor1d([1,2,3]), tf.tensor1d([10,20,30])).

[deleted] t1_j88qsuk wrote on February 12, 2023 at 2:27 PM

Reply to [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

[removed]

RemindMeBot t1_j88nenu wrote on February 12, 2023 at 1:58 PM

Reply to comment by aarz03 in [D] Best available text to speech free AI model out there for english by CeFurkan

I will be messaging you in 3 months on 2023-05-12 13:57:49 UTC to remind you of this link

4 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

aarz03 t1_j88nce5 wrote on February 12, 2023 at 1:57 PM

Reply to [D] Best available text to speech free AI model out there for english by CeFurkan

!remindme 3months

amposectival t1_j88kloo wrote on February 12, 2023 at 1:32 PM

Reply to [D] Looking for an open source Downloadable model to run on my local device. by [deleted]

You can either use Hugging Face Transformers as they have a lot of pre-trained models that you can customize. Or Finetuners like this one: which is a toolkit for fine-tuning multiple models.

harharveryfunny t1_j88kg27 wrote on February 12, 2023 at 1:30 PM

Reply to comment by SatoshiNotMe in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

>some of the things that make ML code inscrutable are that (a) every tensor has a shape that you have to guess, and keep track of as it goes through the various layers

That's not inherent to ML though - that's a library design choice to have tensor shape be defined at runtime vs compile time. A while back I wrote my own framework in C++ and chose to go with compile-time shapes, which as well as preventing shape errors is more in keeping with C++'s typing. For a dynamically typed language like Python maybe runtime-defined types/shapes seems a more natural choice, but still a choice nonetheless.

That_Violinist_18 t1_j88ilse wrote on February 12, 2023 at 1:12 PM

Reply to comment by currentscurrents in The Inference Cost Of Search Disruption – Large Language Model Cost Analysis [D] by norcalnatv

I keep hearing this argument, but I also keep hearing that models are hitting 60%+ of peak throughput for GPUs when optimizations like FlashAttention and other things are considered.

So how much room is there for alternative architectures when the current hardware only leaves at most 40% of its peak performance on the table?

[deleted] t1_j88hzq2 wrote on February 12, 2023 at 1:05 PM

Reply to [P] Introducing arxivGPT: chrome extension that summarizes arxived research papers using chatGPT by _sshin_

[deleted]

SatoshiNotMe t1_j88fz8v wrote on February 12, 2023 at 12:43 PM

Reply to [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

I agree, some of the things that make ML code inscrutable are that (a) every tensor has a shape that you have to guess, and keep track of as it goes through the various layers, plus (b) layers or operations that you have to constantly look up how they change the tensor shapes.

I’ve settled on two best practices to mitigate these:

Always include the tensor dimensions in the variable name: e.g. x_b_t_e is a tensor of shape (b,t,e), a trick I learned at a Berkeley DRL workshop many years ago.
Einops all the things! https://einops.rocks/

With einops you can express ops and layers in a transparent way by how the tensor dims change. And now suddenly your code is refreshingly clear.

The Einops page gives many nice examples but here’s a quick preview. Contrast these two lines:

`

y= x.view(x.shape[0], -1) # x: (batch, 256, 19, 19)

y_b_chw = rearrange(x_b_c_h_w, b c h w -> b (c h w)’)

`

Yes a little verbose but I find this helps hugely with the two issues mentioned above. YMMV :)

__lawless t1_j88fo2u wrote on February 12, 2023 at 12:40 PM

Reply to [D] Looking for an open source Downloadable model to run on my local device. by [deleted]

I don’t think you have a sound idea what you are trying to do. So you want chatGPT + extra!!! What you are asking does not exist, at least currently. Making a model size of chatGPT will cost at the very least $5M and absolutely not possible locally. You need a distributed setup. Not to mention all the technical difficulties of making such a setup.

a_user_to_ask t1_j88f7f6 wrote on February 12, 2023 at 12:35 PM

Reply to [D] Is it legal to use images or videos with copyright to train a model? by Tlaloc-Es

In an ideal world, each image of a dataset used in machine learning have to be identified with author and license. But I understand that is difficult to achieve because images are copied in the www and it is difficult locate the original source.

So, I have no doubt about the illegality of use images from web scrapping. Other thing is how easy is win/loss a lawsuit and to prove you used that data or not.

cajmorgans t1_j88dj8d wrote on February 12, 2023 at 12:14 PM

Reply to [D] Can Google sue OpenAI for using the Transformer in their products? by t0t0t4t4

Patenting stuff is so weird if you think about it, it feels so 19th century

cajmorgans t1_j88d068 wrote on February 12, 2023 at 12:08 PM

Reply to comment by [deleted] in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

That’s actually pretty cool! I first red ”MongoDB” and thought, not this discussion again…

Malignant-Koala t1_j888ezl wrote on February 12, 2023 at 11:06 AM

Reply to comment by Gullible-Mousse-4258 in [D] Looking for an open source Downloadable model to run on my local device. by [deleted]

>I would like this model to end up functioning like chatgpt. Not only to have it respond like a human/nlp but to also give me full technical answers, descriptions, and just simple specific answers to my questions. I will in the future update the model’s data/knowledge and also train it to do new tasks like image recognition and so on.

Based on your edit, I'm not sure you realize the scope of what you are trying to do. ChatGPT required almost 200 hundred billion data parameters, multiple NVIDIA A100s and many terrabytes of RAM to train.

You simply cannot expect to create a general purpose, human-sounding AI that does all of the things you expect to train it to do on a home computer, even if you were somehow a brilliant data scientist.

[deleted] t1_j888739 wrote on February 12, 2023 at 11:03 AM

Reply to [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

Not a programming language, but a database solution, called MindsDB.

>MindsDB brings machine learning into databases by employing the concept of AI Tables.
>
>AI Tables are machine learning models stored as virtual tables inside a database. They facilitate making predictions based on your data. You can perform the time series, regression, and classification predictions within your database and get the output almost instantly by querying an AI Table with simple SQL statements.

Edit: yes, I've been watching FireShip))

-Alexandros t1_j887z9p wrote on February 12, 2023 at 11:00 AM

Reply to comment by johnwireds in [D] Which is the fastest and lightweight ultra realistic TTS for real-time voice cloning? by akshaysri0001

Yeah that would be pretty cool and trippy.

Gullible-Mousse-4258 t1_j887jt5 wrote on February 12, 2023 at 10:54 AM

Reply to comment by Disastrous_Elk_6375 in [D] Looking for an open source Downloadable model to run on my local device. by [deleted]

I have been by hugging face before, but personally, I wanted to hear from the community on their opinion/experiences on the matter. I have edited my post, I hope the edit helps in any way.

digikar t1_j887j8f wrote on February 12, 2023 at 10:54 AM

Reply to [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

I've been wishing this from the time I ran into "errors" involving numpy broadcasting along incorrect dimensions. Errors of the kind: you wanted to add a 10-length vector to a (10,10)-matrix by treating the vector as a (10,1) matrix. But instead, no one told you about such errors and you spent hours debugging only to learn that a 10-length vector is treated as a (1,10)-matrix in this particular case.

But yup, computing on dimension information to provide static autocompletion as well as dimension checks themselves seems like a huge plus.

Besides compile time checks, another feature I have wished for is to index the array dimensions by the name of the dimension rather than its axis-number.

For me, Common Lisp coupled with CLTL2 and defacto-libraries is the closest language that comes to make this a possibility. The built-in Common Lisp array types are already fairly extensive in that they actually allow specifying the per-axis dimensions, which the compiler can then use to type check. Common Lisp's SBCL compiler does this in a fair number of cases. For example -

(declaim (inline add-vectors))
(defun add-vectors (a b out)
  (declare (type (simple-array single-float (10)) a b out))
  (loop for i below 10
        do (setf (aref out i)
                 (+ (aref a i)
                    (aref b i))))
  out)

Consider the add-vectors function defined above that takes three arrays each with element type single float and a single axis of length 10, adds the first two arguments x and y element-wise and stores the result into out.

Then, if you try to compile the following function:

(defun add-to-first (x y)
  (declare (type (simple-array single-float (10)) x)
           (type (simple-array single-float (05)) y))
  (add-vectors x y x))

The compiler SBCL actually signals an error during compilation itself:

; processing (DEFUN ADD-TO-FIRST ...)

; file: /tmp/slimeBh9nFb
; in: DEFUN ADD-TO-FIRST
;     (ADD-VECTORS X Y X)
;
; note: deleting unreachable code
;
; caught WARNING:
;   Derived type of COMMON-LISP-USER::Y is
;     (VALUES (SIMPLE-ARRAY SINGLE-FLOAT (5)) &amp;OPTIONAL),
;   conflicting with its asserted type
;     (SIMPLE-ARRAY SINGLE-FLOAT (10)).
;   See also:
;     The SBCL Manual, Node "Handling of Types"
;
; compilation unit finished
;   caught 1 WARNING condition
;   printed 1 note

But that said, Common Lisp leaves a lot many things wanting. Not only are there no parametric types, its type system also has no formal structure like the Hindley-Milner type system. There's an attempt at Coalton to bridge this and bring HM-based type checking to Common Lisp.

However, even with HM, the notion of per-axis dimensions is hard, although doable. With a fair bit of macrology over the past two years, some of us* have been able to come up with something that allows for the following:

(in-package :polymorphic-functions)

(push (lambda (x)
        (member x '(&lt;m&gt; &lt;len&gt; &lt;type&gt;)))
      *parametric-type-symbol-predicates*)

(defpolymorph pf-add-vectors ((a   (simple-array &lt;type&gt; (&lt;len&gt;)))
                              (b   (simple-array &lt;type&gt; (&lt;len&gt;)))
                              (out (simple-array &lt;type&gt; (&lt;len&gt;))))
    (simple-array &lt;type&gt; (&lt;len&gt;))
  (loop :for i :below &lt;len&gt;
        :do (setf (aref out i)
                  (+ (aref a i)
                     (aref b i))))
  out)

And then if one tries to compile the add-to-first defined above:

(defun add-to-first (x y)
  (declare (type (simple-array single-float (10)) x)
           (type (simple-array single-float (05)) y)
           (optimize safety))
  (pf-add-vectors x y x))

One gets the following compiler note:

; processing (DEFUN ADD-TO-FIRST ...)
; While compiling
;     (PF-ADD-VECTORS X Y X)
;   Following notes were encountered:
;
;     No applicable POLYMORPH discovered for polymorphic-function
;       PF-ADD-VECTORS
;     and ARGS:
;
;       (X Y X)
;
;     derived to be of TYPES:
;
;       ((SIMPLE-ARRAY SINGLE-FLOAT (10)) (SIMPLE-ARRAY SINGLE-FLOAT (5))
;        (SIMPLE-ARRAY SINGLE-FLOAT (10)))
;
;     Available Effective-Type-Lists include:
;
;       ((SIMPLE-ARRAY &lt;TYPE&gt; (&lt;LEN&gt;)) (SIMPLE-ARRAY &lt;TYPE&gt; (&lt;LEN&gt;))
;        (SIMPLE-ARRAY &lt;TYPE&gt; (&lt;LEN&gt;)))

And the following compiles successfully:

(defun add-to-first (x y)
  (declare (type (simple-array single-float (10)) x)
           (type (simple-array single-float (10)) y)
           (optimize speed))
  (pf-add-vectors x y x))

And a fair bit optimally when declared so.:

; disassembly for ADD-TO-FIRST
; Size: 149 bytes. Origin: #x53BD456C                         ; ADD-TO-FIRST
.
.
.
; 5D0: L0:   F30F104C4E01     MOVSS XMM1, [RSI+RCX*2+1]
; 5D6:       F30F10544F01     MOVSS XMM2, [RDI+RCX*2+1]
; 5DC:       F30F58D1         ADDSS XMM2, XMM1
; 5E0:       F30F11544E01     MOVSS [RSI+RCX*2+1], XMM2
.
.
.

Note that there are no parametric types in the sense of HM types. This is rather a symbol substitution and declaration-propagation strategy that is being employed here. Regardless, this is very much rudimentary, has no formal semantics, and at the current rate, I will expect it to take several years for reaching maturity. But yeah, someone with the background in (dependent) type theory and the time to implement and debug it is certainly welcome to experiment with Common Lisp and SBCL to see what might be possible :D.

Recent comments in /f/MachineLearning