- I've been doing more research in the AI/ML arena. There are obviously a lot of free resources out there regarding this. Source code repositories such as Github seems to be a strange source/pointer for free books in particular? I've found that Universities often have a lot of free material as well
http://www.mlebook.com/wiki/doku.php
http://incompleteideas.net/book/RLbook2018.pdf
http://incompleteideas.net/book/the-book-2nd.html
https://korbit.ai/machinelearning
statsoft
http://www.statsoft.com/Textbook/Free-Resources
http://www.statsoft.com/Textbook
http://www.statsoft.com/Portals/0/Products/Data-Mining/ShortCourseInDataMining.pdf
http://incompleteideas.net/book/RLbook2018.pdf
http://incompleteideas.net/book/the-book-2nd.html
https://korbit.ai/machinelearning
statsoft
http://www.statsoft.com/Textbook/Free-Resources
http://www.statsoft.com/Textbook
http://www.statsoft.com/Portals/0/Products/Data-Mining/ShortCourseInDataMining.pdf
https://github.com/finos
search open s3 buckets
https://portswigger.net/daily-swig/new-tool-helps-you-find-open-amazon-s3-buckets
https://buckets.grayhatwarfare.com/results/machine%20learning
http://jermsmit.com/how-to-search-for-open-amazon-s3-buckets-and-their-contents/
search open s3 buckets
https://portswigger.net/daily-swig/new-tool-helps-you-find-open-amazon-s3-buckets
https://buckets.grayhatwarfare.com/results/machine%20learning
http://jermsmit.com/how-to-search-for-open-amazon-s3-buckets-and-their-contents/
free certificate short course data mining
machine learning bash
https://towardsdatascience.com/how-to-invest-like-a-data-scientist-bada78787d57
github ai books
https://github.com/aibooks/aibooks.github.io
https://github.com/zslucky/awesome-AI-books
https://github.com/lahorekid/AI-Books
https://github.com/Racim/Free-Artificial-Intelligence-Books
github ai books
https://github.com/aibooks/aibooks.github.io
https://github.com/zslucky/awesome-AI-books
https://github.com/lahorekid/AI-Books
https://github.com/Racim/Free-Artificial-Intelligence-Books
- obviously, the free resources tend to be linked to Free and Open Source (FOSS) software though. If you have limited Internet access another option is to get cheap textbooks from Op Shops. Learn to speed read or use summation software if you want to get through the texts quickly
rstudio pdf manual
./mass_downloader.sh https://dss.princeton.edu/training/ documents
- one thing I've come to realise is that a lot of algorithms could be expressed in ways that are simpler and that could be done without requiring something akin to basic AI/ML? This also results in faster process execution as well
neural network linux cli
- the main problem that I have with working from existing software is that a lot of the time they don't cover every single eventuality that you need/want to deal with. That said, a lot of intermediate work needs to be done to get where I want. One thing I've been wondering is how far you can push science with pure analytics via automated analysis style techniques?
what can you do with rapidminer
- it feels like there's nothing out there that does exactly what I want so I'll build something
autocorrelation csv script linux bash
cross tabulation
cross tabulation linux cli
frequency distribution numbers csv
./mass_downloader.sh https://cosmosweb.champlain.edu/people/stevens/WebTech/ documents
http://www.silota.com/docs/recipes/sql-histogram-summary-frequency-distribution.html
pattern recognition r
https://cran.r-project.org/web/packages/bpa/vignettes/introduction.html
https://www.r-bloggers.com/simple-pattern-detection-in-numerical-data/
https://www.jdatalab.com/data_science_and_data_mining/2017/03/20/regular-expression-R.html
https://gist.github.com/casallas/8374933
https://en.wikipedia.org/wiki/Heuristic
https://gitlab.com/alfiedotwtf/metaheuristics
https://gitlab.com/users/alfiedotwtf/projects
mini data mining open source
https://www.softwaretestinghelp.com/data-mining-tools/
https://thenewstack.io/six-of-the-best-open-source-data-mining-tools/
https://opensourceforu.com/2017/03/top-10-open-source-data-mining-tools/
https://en.wikipedia.org/wiki/Data_mining
R Analytical Tool To Learn Easily
https://rattle.togaware.com/
https://rattle.togaware.com/rattle-install-mswindows.html
https://rattle.togaware.com/rattle-install-linux.html
https://bitbucket.org/kayontoga/rattle/src/master/
yara auto generation
https://github.com/Xen0ph0n/YaraGenerator
https://github.com/Neo23x0/yarGen
https://www.andreafortuna.org/2017/04/27/two-open-source-tools-to-easily-generate-yara-rules/
https://github.com/Yara-Rules
https://github.com/Xen0ph0n
https://www.heroku.com/home
https://www.heroku.com/pricing
https://discovertext.com/
https://alternativeto.net/software/discovertext/
pattern recognition r
https://cran.r-project.org/web/packages/bpa/vignettes/introduction.html
https://www.r-bloggers.com/simple-pattern-detection-in-numerical-data/
https://www.jdatalab.com/data_science_and_data_mining/2017/03/20/regular-expression-R.html
https://gist.github.com/casallas/8374933
https://en.wikipedia.org/wiki/Heuristic
https://gitlab.com/alfiedotwtf/metaheuristics
https://gitlab.com/users/alfiedotwtf/projects
mini data mining open source
https://www.softwaretestinghelp.com/data-mining-tools/
https://thenewstack.io/six-of-the-best-open-source-data-mining-tools/
https://opensourceforu.com/2017/03/top-10-open-source-data-mining-tools/
https://en.wikipedia.org/wiki/Data_mining
R Analytical Tool To Learn Easily
https://rattle.togaware.com/
https://rattle.togaware.com/rattle-install-mswindows.html
https://rattle.togaware.com/rattle-install-linux.html
https://bitbucket.org/kayontoga/rattle/src/master/
yara auto generation
https://github.com/Xen0ph0n/YaraGenerator
https://github.com/Neo23x0/yarGen
https://www.andreafortuna.org/2017/04/27/two-open-source-tools-to-easily-generate-yara-rules/
https://github.com/Yara-Rules
https://github.com/Xen0ph0n
https://www.heroku.com/home
https://www.heroku.com/pricing
https://discovertext.com/
https://alternativeto.net/software/discovertext/
- auto-correlation is time intensive particularly as total number of attributes goes up. It's a question of combinatorics, permutations, etc? For 6 symbols:
echo "6*5*4*3*2*1" | bc
720
Should you come up indexes, basic metrics, etc... that can ouline better what is happening? Getting the average of the entire group or picking out specific attributes is the obvious way to do this?
- tsv-utils from eBay seems useless. Got heaps of wierd compilation errors using local D compiler though . Guessing because it's because I was using a different compiler then what eBay were using. D reminds me of a cross between C++ and Lisp. Had to make a lot of modifications to get things to compile
- as an aside, I wanted to better understand how a lot of S3 enumerators worked. Genuinely pretty simple. I used https://buckets.grayhatwarfare.com/ to lookup some stuff on Machine Learning. Simply truncating URLs into their core leads you to an XML based index file which you can use to parse and look for material that you may be interested in
http://rcomm.s3.amazonaws.com/Building%20a%20Large%20Scale%20Machine%20Learning-Based%20Anomaly%20Detection%20System,%20Part%201%20-%20Design%20Principles.pdf
http://rcomm.s3.amazonaws.com/
https://buckets.grayhatwarfare.com/results/machine%20learning/40
- it's not much different from the domain resolution tools that I've written in the past? Note, that with the advent online source code repositories how easy it is to learn how a lot of tools work? If you want to build a search engine just integrate it with a search engine and crawler component. I'm assuming a lot of publicly available material aren't security failures and that a lot of these materials are genuinely supposed to be free but then again who knows in the age of outsourcing, ad-hoc/dodgy/IT people, hybrid warfare, and liberalism?
https://github.com/jordanpotti/AWSBucketDump
https://github.com/jordanpotti/AWSBucketDump/blob/master/AWSBucketDump.py
https://dtbnguyen.blogspot.com/2018/09/subdomain-resolve-security-script.html
https://dtbnguyen.blogspot.com/2018/09/web-traverser-security-script-random.html
http://dtbnguyen.blogspot.com/2020/01/finding-patterns-in-data-genetic-data.html
https://dtbnguyen.blogspot.com/2019/10/web-crawler-script-random-stuff-and-more.html
https://dtbnguyen.blogspot.com/2019/11/mini-search-engine-prototype-random.html
https://dtbnguyen.blogspot.com/2016/04/hybrid-warfare-more-psyops-and-more.html
search open s3 buckets
https://portswigger.net/daily-swig/new-tool-helps-you-find-open-amazon-s3-buckets
https://buckets.grayhatwarfare.com/results/machine%20learning
http://jermsmit.com/how-to-search-for-open-amazon-s3-buckets-and-their-contents/
http://rcomm.s3.amazonaws.com/Building%20a%20Large%20Scale%20Machine%20Learning-Based%20Anomaly%20Detection%20System,%20Part%201%20-%20Design%20Principles.pdf
http://rcomm.s3.amazonaws.com/
https://buckets.grayhatwarfare.com/results/machine%20learning/40
- it's not much different from the domain resolution tools that I've written in the past? Note, that with the advent online source code repositories how easy it is to learn how a lot of tools work? If you want to build a search engine just integrate it with a search engine and crawler component. I'm assuming a lot of publicly available material aren't security failures and that a lot of these materials are genuinely supposed to be free but then again who knows in the age of outsourcing, ad-hoc/dodgy/IT people, hybrid warfare, and liberalism?
https://github.com/jordanpotti/AWSBucketDump
https://github.com/jordanpotti/AWSBucketDump/blob/master/AWSBucketDump.py
https://dtbnguyen.blogspot.com/2018/09/subdomain-resolve-security-script.html
https://dtbnguyen.blogspot.com/2018/09/web-traverser-security-script-random.html
http://dtbnguyen.blogspot.com/2020/01/finding-patterns-in-data-genetic-data.html
https://dtbnguyen.blogspot.com/2019/10/web-crawler-script-random-stuff-and-more.html
https://dtbnguyen.blogspot.com/2019/11/mini-search-engine-prototype-random.html
https://dtbnguyen.blogspot.com/2016/04/hybrid-warfare-more-psyops-and-more.html
search open s3 buckets
https://portswigger.net/daily-swig/new-tool-helps-you-find-open-amazon-s3-buckets
https://buckets.grayhatwarfare.com/results/machine%20learning
http://jermsmit.com/how-to-search-for-open-amazon-s3-buckets-and-their-contents/
- one thing I've realised is that I just don't have the raw computing power so to optimise as much as I can from my side at all times. Reduce search space, stick to smarter rather then stronger mining, stick to quicker algorithms when possible, etc? Speed of operation is a problem all around on larger data sets and more complex analysis. Heaps of different ways to optimise operations if you look across the board. Makes sense that you'd try to shave off thousands of a second off of operations that are run 100K+ times
http://dtbnguyen.blogspot.com/2020/01/finding-patterns-in-data-genetic-data.html
https://dtbnguyen.blogspot.com/2019/10/web-crawler-script-random-stuff-and-more.html
https://dtbnguyen.blogspot.com/2019/11/mini-search-engine-prototype-random.html
https://dtbnguyen.blogspot.com/2019/09/big-data-and-social-trading-investments.html
https://dtbnguyen.blogspot.com/2019/10/web-crawler-script-random-stuff-and-more.html
https://dtbnguyen.blogspot.com/2019/11/mini-search-engine-prototype-random.html
https://dtbnguyen.blogspot.com/2019/09/big-data-and-social-trading-investments.html
fastest crawler web rate
smallest medical database
fastest isp
- may need a custom benchmarking suite of some sort (CPU, RAM, HDD/SSD, etc....)? If you understand how general computer interaction works with large datasets then you'll understand how big of a problem this is. Sometimes bottlenecks exist in places you didn't expect
- there are auto tuning scripts and software available but they have limited potential for speed gains
autotune performance script linux
- you'll note that combination of compiled and scripting languages quite common in this area. The obvious frustrating thing is that scripting languages are quick to program but slow to run when compared with compiled languages. I built a language converter for scripting languages a while back. May have to add ability to convert to compiled languages as well? C++/Java/Python are the main languages you seem to be dealing with in the space AI/ML space
compile bash script
- Intel Speedstep technology often doesn't work the way you want it. CPU rarely runs at capacity so set it at the correct clock speed manually
change clock speed linux cli
cpufreq-info
cpufreq-set
cat /proc/cpuinfo
- way back when I had to look at High Performance Computing options. At the time, there wasn't too much around and even now the best stuff isn't really the best solution to use on lower end problem sets. As I said earlier I'm thinking about building my own solutions automatically connect remote filesystems. Will probably be a combination of personal scripts as well as FOSS software because it feels like a lot of high end stuff is just overkill and too slow while a lot of attempts by others just aren't advanced enough or appropriate to the situation (I've written books on these topics in the past so this shouldn't be too difficult)? On a air gapped/isolated/secure/separate network things are easier because you don't really have to factor in security as much. You can create a combination of scanners and microservices to detect changes in network configuration. Automount remote resources such as file shares, CPU, network, etc... Google and Amazon have their own respective suite of systems such as Borg and AWS
https://www.linode.com/docs/networking/ssh/using-sshfs-on-linux/
sudo mkdir /mnt/droplet <--replace droplet="" p="" prefer="" whatever="" you="">sudo sshfs -o allow_other,defer_permissions root@xxx.xxx.xxx.xxx:/ /mnt/droplet
sudo sshfs -o allow_other,defer_permissions,IdentityFile=~/.ssh/id_rsa root@xxx.xxx.xxx.xxx:/ /mnt/droplet
sudo umount /mnt/droplet
sudo nano /etc/fstab
sshfs#root@xxx.xxx.xxx.xxx:/ /mnt/droplet
https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh
https://en.wikipedia.org/wiki/SSHFS
alternative to sshfs
https://superuser.com/questions/344255/faster-way-to-mount-a-remote-file-system-than-sshfs
https://alternativeto.net/software/sshfs/
https://www.reddit.com/r/archlinux/comments/vi6vq/sshfs_alternative/
- most distributed filesystem options aren't great for what I need? Most distributed filesystems basically implement RAID across the network. Looked at blockchain and P2P but it's too inefficient for this? We use sentinal file and on system? TFTP, mini-HTTP, etc... Search goes on
python -m SimpleHTTPServer
distributed filesystem linux
https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems
https://en.wikipedia.org/wiki/Clustered_file_system
https://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_systems
https://en.wikipedia.org/wiki/Google_File_System
https://en.wikipedia.org/wiki/File:GoogleFileSystemGFS.svg
https://en.wikipedia.org/wiki/Bigtable
https://en.wikipedia.org/wiki/MinIO--replace>
sudo mkdir /mnt/droplet <--replace droplet="" p="" prefer="" whatever="" you="">sudo sshfs -o allow_other,defer_permissions root@xxx.xxx.xxx.xxx:/ /mnt/droplet
sudo sshfs -o allow_other,defer_permissions,IdentityFile=~/.ssh/id_rsa root@xxx.xxx.xxx.xxx:/ /mnt/droplet
sudo umount /mnt/droplet
sudo nano /etc/fstab
sshfs#root@xxx.xxx.xxx.xxx:/ /mnt/droplet
https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh
https://en.wikipedia.org/wiki/SSHFS
alternative to sshfs
https://superuser.com/questions/344255/faster-way-to-mount-a-remote-file-system-than-sshfs
https://alternativeto.net/software/sshfs/
https://www.reddit.com/r/archlinux/comments/vi6vq/sshfs_alternative/
- most distributed filesystem options aren't great for what I need? Most distributed filesystems basically implement RAID across the network. Looked at blockchain and P2P but it's too inefficient for this? We use sentinal file and on system? TFTP, mini-HTTP, etc... Search goes on
python -m SimpleHTTPServer
distributed filesystem linux
https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems
https://en.wikipedia.org/wiki/Clustered_file_system
https://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_systems
https://en.wikipedia.org/wiki/Google_File_System
https://en.wikipedia.org/wiki/File:GoogleFileSystemGFS.svg
https://en.wikipedia.org/wiki/Bigtable
https://en.wikipedia.org/wiki/MinIO--replace>
distributed file system
https://github.com/robertdavidgraham/masscan
distributed filesystem
https://www.reddit.com/r/compsci/comments/4vsvst/distributed_filesystem_for_billions_of_small_files/
https://github.com/chrislusf/seaweedfs
http://www.xtreemfs.org/
https://serverfault.com/questions/19257/distributed-storage-filesystem-which-one-is-there-a-ready-to-use-product
https://moosefs.com/
https://github.com/moosefs/moosefs
aufs over http
https://www.thegeekstuff.com/2013/05/linux-aufs/
mount filesystem over http linux
https://unix.stackexchange.com/questions/67568/mount-http-server-as-file-system
https://www.tecmint.com/sshfs-mount-remote-linux-filesystem-directory-using-ssh/
https://github.com/fangfufu/httpdirfs
- there is no genuine means of achieving a simple, easy Linux cluster at the moment? Better off building something myself?
open source cluster software
https://en.wikipedia.org/wiki/Comparison_of_cluster_software
https://en.wikipedia.org/wiki/List_of_cluster_management_software
http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm
https://sourceforge.net/directory/system-administration/clustering/os:linux/
https://sourceforge.net/projects/pelicanhpc/files/
https://pelicanhpc.org/
https://sourceforge.net/projects/openmosix/files/
https://sourceforge.net/projects/gxp/files/
https://sourceforge.net/projects/dsh/
https://sourceforge.net/projects/arielcluster/files/
https://sourceforge.net/projects/brainnt/files/
https://en.wikipedia.org/wiki/Foreman_(software)
https://www.theforeman.org/
https://en.wikipedia.org/wiki/Salt_(software)
easy linux cluster
http://www.rocksclusters.org/
http://www.rocksclusters.org/docs.html
https://computing.llnl.gov/tutorials/linux_clusters/
simple cluster linux cli
https://computing.llnl.gov/tutorials/linux_clusters/
https://www.linux.com/tutorials/building-beowulf-cluster-just-13-steps/
distributed filesystem
https://www.reddit.com/r/compsci/comments/4vsvst/distributed_filesystem_for_billions_of_small_files/
https://github.com/chrislusf/seaweedfs
http://www.xtreemfs.org/
https://serverfault.com/questions/19257/distributed-storage-filesystem-which-one-is-there-a-ready-to-use-product
https://moosefs.com/
https://github.com/moosefs/moosefs
aufs over http
https://www.thegeekstuff.com/2013/05/linux-aufs/
mount filesystem over http linux
https://unix.stackexchange.com/questions/67568/mount-http-server-as-file-system
https://www.tecmint.com/sshfs-mount-remote-linux-filesystem-directory-using-ssh/
https://github.com/fangfufu/httpdirfs
- there is no genuine means of achieving a simple, easy Linux cluster at the moment? Better off building something myself?
open source cluster software
https://en.wikipedia.org/wiki/Comparison_of_cluster_software
https://en.wikipedia.org/wiki/List_of_cluster_management_software
http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm
https://sourceforge.net/directory/system-administration/clustering/os:linux/
https://sourceforge.net/projects/pelicanhpc/files/
https://pelicanhpc.org/
https://sourceforge.net/projects/openmosix/files/
https://sourceforge.net/projects/gxp/files/
https://sourceforge.net/projects/dsh/
https://sourceforge.net/projects/arielcluster/files/
https://sourceforge.net/projects/brainnt/files/
https://en.wikipedia.org/wiki/Foreman_(software)
https://www.theforeman.org/
https://en.wikipedia.org/wiki/Salt_(software)
easy linux cluster
http://www.rocksclusters.org/
http://www.rocksclusters.org/docs.html
https://computing.llnl.gov/tutorials/linux_clusters/
simple cluster linux cli
https://computing.llnl.gov/tutorials/linux_clusters/
https://www.linux.com/tutorials/building-beowulf-cluster-just-13-steps/
- CPU switching can gain speed or lose speed depending on the situation. Need to do more benchmarks on this
taskset linux command
taskset 0x1 ping 8.8.8.8
https://baiweiblog.wordpress.com/2017/11/02/how-to-set-processor-affinity-in-linux-using-taskset/
taskset linux command
taskset 0x1 ping 8.8.8.8
https://baiweiblog.wordpress.com/2017/11/02/how-to-set-processor-affinity-in-linux-using-taskset/
time taskset --cpu-list 0 echo
time taskset --cpu-list 1 echo
time taskset --cpu-list 0 ls
time taskset --cpu-list 1 ls
for command in "ls" "ps" "df" "echo" "hostname" "uname" "dir" "date" "pwd" "which" "netstat"
do
echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
echo "---- CLEAN - $command ----"
time "$command" >/dev/null
echo ""
echo "---- CPU0 - $command ----"
time taskset --cpu-list 0 "$command" > /dev/null
echo ""
echo "---- CPU1 - $command ----"
time taskset --cpu-list 1 "$command" > /dev/null
echo ""
echo "---- CPU2 - $command ----"
time taskset --cpu-list 2 "$command" > /dev/null
echo ""
echo "---- CPU3 - $command ----"
time taskset --cpu-list 3 "$command" > /dev/null
echo "=================================================================================="
echo ""
done
multithreading python
https://realpython.com/intro-to-python-threading/
https://docs.python.org/3/library/threading.html
https://stackoverflow.com/questions/2846653/how-can-i-use-threading-in-python
bash script run script across multiple processors
https://stackoverflow.com/questions/15343561/bash-running-the-same-program-over-multiple-cores
https://stackoverflow.com/questions/2425870/multithreading-in-bash
https://stackoverflow.com/questions/7712563/multitasking-on-linux-with-multiple-cpus
bash multiprocessing
https://stackoverflow.com/questions/6441509/how-to-write-a-process-pool-bash-shell
- this is close to what I need but I need it on a lower scale
https://www.vodafone.com.au/foundation/dreamlab
https://www.vodafone.com.au/red-wire/dreamlab
dreamlab garvan open source
https://www.garvan.org.au/support-us/dreamlab/
https://www.garvan.org.au/news-events/news/dreamlab-users-complete-project-decode
- job control is a big issue. You basically need to keep track of all systems in network and then farm out jobs based on which system is most suitable or available. That means inter-process communication and job control systems of some sort
ansible quickstart
https://docs.ansible.com/ansible/latest/user_guide/quickstart.html
https://devhints.io/ansible-guide
https://ryaneschinger.com/blog/ansible-quick-start/
passwordless ssh
ssh-keygen -t rsa
ssh b@B mkdir -p .ssh
cat .ssh/id_rsa.pub | ssh b@B 'cat >> .ssh/authorized_keys'
http://www.linuxproblem.org/art_9.html
https://linuxize.com/post/how-to-setup-passwordless-ssh-login/
https://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
thinthread source code
https://github.com/CHEF-KOCH/NSABlocklist
https://github.com/CHEF-KOCH?tab=repositories
moab linux
https://en.wikipedia.org/wiki/Moab_Cluster_Suite
https://en.wikipedia.org/wiki/Maui_Cluster_Scheduler
http://adaptivecomputing.com/products/open-source/maui/
- I can use programming to parallelise some processes. I can't parallelise across to the GPU though since the underlying SDK's won't allow or unless I write a wrapper script to encapsulate what I want to do via driver functions? Worried it will be like the parallel program. Overhead won't be worth it? Only way to know is to do more testing. Something like ViDock to deal with large scale CUDA farm or is this not enough? Else, something like Antminer?
for command in "ls" "ps" "df" "echo" "hostname" "uname" "dir" "date" "pwd" "which" "netstat"
do
echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
echo "---- CLEAN - $command ----"
time "$command" >/dev/null
echo ""
echo "---- CPU0 - $command ----"
time taskset --cpu-list 0 "$command" > /dev/null
echo ""
echo "---- CPU1 - $command ----"
time taskset --cpu-list 1 "$command" > /dev/null
echo ""
echo "---- CPU2 - $command ----"
time taskset --cpu-list 2 "$command" > /dev/null
echo ""
echo "---- CPU3 - $command ----"
time taskset --cpu-list 3 "$command" > /dev/null
echo "=================================================================================="
echo ""
done
multithreading python
https://realpython.com/intro-to-python-threading/
https://docs.python.org/3/library/threading.html
https://stackoverflow.com/questions/2846653/how-can-i-use-threading-in-python
bash script run script across multiple processors
https://stackoverflow.com/questions/15343561/bash-running-the-same-program-over-multiple-cores
https://stackoverflow.com/questions/2425870/multithreading-in-bash
https://stackoverflow.com/questions/7712563/multitasking-on-linux-with-multiple-cpus
bash multiprocessing
https://stackoverflow.com/questions/6441509/how-to-write-a-process-pool-bash-shell
- this is close to what I need but I need it on a lower scale
https://www.vodafone.com.au/foundation/dreamlab
https://www.vodafone.com.au/red-wire/dreamlab
dreamlab garvan open source
https://www.garvan.org.au/support-us/dreamlab/
https://www.garvan.org.au/news-events/news/dreamlab-users-complete-project-decode
- job control is a big issue. You basically need to keep track of all systems in network and then farm out jobs based on which system is most suitable or available. That means inter-process communication and job control systems of some sort
ansible quickstart
https://docs.ansible.com/ansible/latest/user_guide/quickstart.html
https://devhints.io/ansible-guide
https://ryaneschinger.com/blog/ansible-quick-start/
passwordless ssh
ssh-keygen -t rsa
ssh b@B mkdir -p .ssh
cat .ssh/id_rsa.pub | ssh b@B 'cat >> .ssh/authorized_keys'
http://www.linuxproblem.org/art_9.html
https://linuxize.com/post/how-to-setup-passwordless-ssh-login/
https://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
thinthread source code
https://github.com/CHEF-KOCH/NSABlocklist
https://github.com/CHEF-KOCH?tab=repositories
moab linux
https://en.wikipedia.org/wiki/Moab_Cluster_Suite
https://en.wikipedia.org/wiki/Maui_Cluster_Scheduler
http://adaptivecomputing.com/products/open-source/maui/
- I can use programming to parallelise some processes. I can't parallelise across to the GPU though since the underlying SDK's won't allow or unless I write a wrapper script to encapsulate what I want to do via driver functions? Worried it will be like the parallel program. Overhead won't be worth it? Only way to know is to do more testing. Something like ViDock to deal with large scale CUDA farm or is this not enough? Else, something like Antminer?
run bash script on gpu cuda
linux cli cuda bash
https://linoxide.com/linux-how-to/install-cuda-ubuntu/
antminer
https://shop.bitmain.com/promote/antminer_s9i_asic_bitcoin_miner/overview
https://antminers9.com.au/
antminer
https://shop.bitmain.com/promote/antminer_s9i_asic_bitcoin_miner/overview
https://antminers9.com.au/
- the problem of parallel is that it allows for easier programming but it's much slower then straight programming? It allows for simpler but slower programming. Would have been nice had it been faster because it would be like multi-threading except simpler
time parallel echo 1 2 ::: 3 4
time echo "1 2 3"; echo "1 2 4"
time echo "1 2 3"; time echo "1 2 4"
time for i in "1 2"; do echo "$i 3"; echo "$i 4"; done
- I've investigated how to share CPU processed across the network since latency more important then throughput in my case
sharing cpu across network linux
- cluster based on visuals, attributes, data reductionism/hashing/distilling style algorithms, etc... The more attributes you look obviously the more accurate will be down the line
- do you have to make a choice between fast algorithms, accurate ones, or can you come up with a compromise? Do you have multiple algorithms. First to check out whether something is worth looking at and then other algorithms to dig deeper? A little bit like multi-pass algorithms?
- need a way to resume from where we left off? Re-processing takes too much time for large data sets
- need auto sampling to make things quicker? You need a fast sampling mechanism otherwise you could be scanning a lot of different files for something that may not eventuate?
randomise lines linux cli
https://stackoverflow.com/questions/2153882/how-can-i-shuffle-the-lines-of-a-text-file-on-the-unix-command-line-or-in-a-shel/2153897
https://howto.lintel.in/shuffle-lines-file-linux/
https://shapeshed.com/unix-shuf/
randomise lines linux cli
https://stackoverflow.com/questions/2153882/how-can-i-shuffle-the-lines-of-a-text-file-on-the-unix-command-line-or-in-a-shel/2153897
https://howto.lintel.in/shuffle-lines-file-linux/
https://shapeshed.com/unix-shuf/
- need automated outlier and anomaly detection? These outliers often fudge data?
- need a way to add data over time while not having to re-process everything?
- prediction, clustering, incident response (human) are key to finding patterns. I've looked at generic AI/ML capabilities but they're slow, inappropriate, overkill, etc... for what I want to do? The other problem is that you're limited to what that particular library/API can do as well?
siem pattern finding algorithms
Fortunately, machine learning can aid in solving the most common tasks including regression, prediction, and classification. In the era of extremely large amount of data and cybersecurity talent shortage, ML seems to be an only solution.
This article is an introduction written to give practical technical understanding of the current advances and future directions of ML research applied to cybersecurity.
- getting a plot on the CLI is actually somewhat difficult. The main problem is that it often doesn't scale properly. You end up with graphs that aren't genuinely representative of the data in question. More research require
git clone https://github.com/dandavison/eplot
easy graph plot linux cli
https://www.linuxlinks.com/excellent-free-plotting-tools/
https://computingforgeeks.com/termgraph-command-line-tool-draw-graphs-in-terminal-linux/
https://superuser.com/questions/825588/what-is-the-easiest-way-of-visualizing-data-from-stdout-as-a-graph
https://stackoverflow.com/questions/123378/command-line-unix-ascii-based-charting-plotting-tool
https://unix.stackexchange.com/questions/190337/how-can-i-make-a-graphical-plot-of-a-sequence-of-numbers-from-the-standard-input
http://zenonharley.com/gnuplot/cli/2015/06/29/graphing-data-from-the-command-line.html
ggplot example
http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
http://r-statistics.co/ggplot2-Tutorial-With-R.html
alternate eplot linux
splunk alternative to
https://en.wikipedia.org/wiki/Splunk
https://docs.fluentd.org/how-to-guides/free-alternative-to-splunk-by-fluentd
https://alternativeto.net/software/splunk/
https://alternativeto.net/software/splunk/?license=free
- file format can be a problem. These are formats were supposed to help deal with this? It's generally the same fields you're dealing with. Space, defense, meteorology, etc... for Big Data
https://www.unidata.ucar.edu/software/netcdf/
https://en.wikipedia.org/wiki/NetCDF
https://en.wikipedia.org/wiki/Hierarchical_Data_Format
https://support.hdfgroup.org/HDF5/whatishdf5.html
https://en.wikipedia.org/wiki/GRIB
https://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_doc/
https://en.wikipedia.org/wiki/Project_Jupyter
https://shiny.rstudio.com/
http://www.levlafayette.com/repositories
easy graph plot linux cli
https://www.linuxlinks.com/excellent-free-plotting-tools/
https://computingforgeeks.com/termgraph-command-line-tool-draw-graphs-in-terminal-linux/
https://superuser.com/questions/825588/what-is-the-easiest-way-of-visualizing-data-from-stdout-as-a-graph
https://stackoverflow.com/questions/123378/command-line-unix-ascii-based-charting-plotting-tool
https://unix.stackexchange.com/questions/190337/how-can-i-make-a-graphical-plot-of-a-sequence-of-numbers-from-the-standard-input
http://zenonharley.com/gnuplot/cli/2015/06/29/graphing-data-from-the-command-line.html
ggplot example
http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
http://r-statistics.co/ggplot2-Tutorial-With-R.html
alternate eplot linux
splunk alternative to
https://en.wikipedia.org/wiki/Splunk
https://docs.fluentd.org/how-to-guides/free-alternative-to-splunk-by-fluentd
https://alternativeto.net/software/splunk/
https://alternativeto.net/software/splunk/?license=free
- file format can be a problem. These are formats were supposed to help deal with this? It's generally the same fields you're dealing with. Space, defense, meteorology, etc... for Big Data
https://www.unidata.ucar.edu/software/netcdf/
https://en.wikipedia.org/wiki/NetCDF
https://en.wikipedia.org/wiki/Hierarchical_Data_Format
https://support.hdfgroup.org/HDF5/whatishdf5.html
https://en.wikipedia.org/wiki/GRIB
https://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_doc/
https://en.wikipedia.org/wiki/Project_Jupyter
https://shiny.rstudio.com/
http://www.levlafayette.com/repositories
- these are the type of algorithms I'm most interested in. Can you generate random algorithms or pick patterns from data and apply them and see which one/ones work best?
pastebin email binance
pastebin email binance
- the algorithms are basically universal but the most heavily investigated is probably finance? Technical traders who strategies work takes keep it hidden? Reminds me of close versus open source based strategies. Sells strategies, books, is a money manager. Active current manager but goes 'long' for super/retirement funds. Uses Premium Data for information and Amibroker for trading software. Trend follower and mean reversion. Book called Unholy Grail. Growth Portfolio = Bollinger Band Breakout. thechartist.com.au/chat eBook similar to strategy that he uses in Australia and US
How to develop robust trading systems _ Nick Radge, The Chartist
http://dtbnguyen.blogspot.com/2019/12/news-vix-script-jim-simonsed-thorp.html
algorithm periodic series of numbers
- useful patterns to note is that massive jumps/hard drops indicative of insider trading, share buy backs, market manipulation, good/bad news, etc... Lots of different reasons for particular patterns. Study particular traders and try to algorithmically apply their strategies to practice pattern matching? In terms of science it could indicate boundary conditions where/when significant changes occur overall within the system in question?
https://www.tradingview.com/u/alfiedotwtf/#published-charts
https://www.tradingview.com/u/alfiedotwtf/#published-charts
- one thing that I've figured out is that the Big 5 IT firms are beatable from a technical standpoint. You just need enough financial backing to put your work into play? That probably explains the nature of funding in the Venture Capital community?
https://www.wired.com/2013/08/coreos-the-new-linux/
https://en.wikipedia.org/wiki/Container_Linux
https://www.wired.com/2013/08/coreos-the-new-linux/
https://en.wikipedia.org/wiki/Container_Linux
https://dtbnguyen.blogspot.com/2017/08/the-big-5-us-it-firms-arent-unbeatable.html
- I've looked at some relevant projects. Nothing clean or relevant enough to what I want?
seti@home sourcecode
https://setiathome.berkeley.edu/sah_source_code.php
https://setiathome.berkeley.edu/sah_porting.php
https://github.com/BOINC/boinc
https://github.com/UCBerkeleySETI
easy way to parallelise shell script
https://unix.stackexchange.com/questions/103920/parallelize-a-bash-for-loop
- I've looked at some relevant projects. Nothing clean or relevant enough to what I want?
seti@home sourcecode
https://setiathome.berkeley.edu/sah_source_code.php
https://setiathome.berkeley.edu/sah_porting.php
https://github.com/BOINC/boinc
https://github.com/UCBerkeleySETI
easy way to parallelise shell script
https://unix.stackexchange.com/questions/103920/parallelize-a-bash-for-loop
- there aren't too many genuine pattern matching algorithms? My guess is that you may need to process data and then look for matches in upcoming data? This means permanent state of analysis if you want to watch short to medium term trends?
find occurrences of pattern in string linux cli
1 Overview
1.1 Probabilistic classifiers
1.2 Number of important feature variables
2 Problem statement (supervised version)
2.1 Frequentist or Bayesian approach to pattern recognition
3 Uses
4 Algorithms
4.1 Classification algorithms (supervised algorithms predicting categorical labels)
4.2 Clustering algorithms (unsupervised algorithms predicting categorical labels)
4.3 Ensemble learning algorithms (supervised meta-algorithms for combining multiple learning algorithms together)
4.4 General algorithms for predicting arbitrarily-structured (sets of) labels
4.5 Multilinear subspace learning algorithms (predicting labels of multidimensional data using tensor representations)
4.6 Real-valued sequence labeling algorithms (predicting sequences of real-valued labels)
4.7 Regression algorithms (predicting real-valued labels)
4.8 Sequence labeling algorithms (predicting sequences of categorical labels)
pattern recognition algorithms linux
The process itself looks like this:
Data is gathered from its sources (via tracking or input)
Data is cleaned up from the noise
Information is examined for relevant features or common elements
These elements are subsequently grouped in specific segments;
The segments are analyzed for insights into data sets;
The extracted insights are implemented into the business operation.
- based on personal testing csvkit is pretty slow, st is middling, ministat/R/PSPP are quick. These can get basics such as moving average, mean, average, standard deviation, etc...
- the frustrating thing about using scripting languages is that even the simplest bit or character level based comparisons is much more complicated then it needs to be. Might need to do hybrid setup?
print first few characters linux
xor two files linux
Looking for XOR files utility
cat asl.de.txt.ministat | grep '|' | grep -v '_' | sed 's/|//g' | cut -c1-32 > asl1
cat asl.de.txt.ministat | grep '|' | grep -v '_' | sed 's/|//g' | cut -c33- > asl2
- will have to deal with error correction as the system scales up
https://www.newyorker.com/magazine/2018/12/10/the-friendship-that-made-google-huge
- will have to deal with error correction as the system scales up
https://www.newyorker.com/magazine/2018/12/10/the-friendship-that-made-google-huge
Random Stuff:
- as usual thanks to all of the individuals and groups who purchase and use my goods and services
- I get a lot of SPAM/invitations to write for journals. Thanks, just busy on other stuff at the moment...
- latest in science and technology
Planet SOS: Can GMO plants stop global warming?
Chinese scientists unveil surveillance ‘super camera’
https://www.businessinsider.com.au/space-breakthroughs-2019-exoplanets-solar-probe-2019-12?r=US&IR=T
https://www.socialtalent.com/blog/recruitment/how-to-go-from-40-to-4000-linkedin-contacts-in-4-hours
- latest in finance and politics
Is Bolivia's Evo Morales the victim of a coup _ UpFront (Feature)
Can the US dominate the global financial system forever _ Bottom Line
Holy crusade of climate change sets out to save our polluted souls
UNGA: FM Lavrov lays out Russian view of the world | 2019
Billionaire Bloomberg Jumping Into Presidential Race – But WHY
The Global Warming Scam - David Icke
https://www.local10.com/news/florida/broward/thief-robs-foot-locker-store-at-sawgrass-mills-fbi-says
- latest in defense and intelligence
[182] Nuclear Pollution in The Marshall Islands, Plus Democracy in The US with Roslyn Fuller
Immunity for sale: Diplomatic passport trade investigated
'Veterans Day' means nothing if US gov't betrays veterans
- latest in animal news
How panda center overcame its breeding rut
- latest in music and entertainment
Random Quotes:
- China is building a particle accelerator that will be twice as large and seven times as powerful as CERN’s LHC and astrophysicist, Martin Rees, famous for his contributions to black hole formation to extragalactic radio sources and the evolution of the universe, thinks that there’s a chance the colliders could cause a “catastrophe that engulfs space itself”.
Contrary to popular perception, the vacuum of space is not an empty void. The vacuum, Rees states, has in it “all the forces and particles that govern the physical world.” And he adds, it’s possible that the vacuum we can observe is in reality “fragile and unstable.”
What means is that when a collider such as CERN’s LHC creates unimaginably concentrated energy by smashing particles together, Rees says, it can create a “phase transition” which would tear asunder the fabric of space,” which cause a cosmic calamity not just a terrestrial one.”
The possibility is that quarks would reassemble themselves into compressed objects called strangelets. That in itself would be harmless. However under some hypotheses a strangelet could, by contagion, convert anything else it encounters into a new form of matter, transforming the entire earth in a hyperdense sphere about one hundred meters across –the size of a soccer field.
The building blocks of matter in our universe were formed in the first 10 microseconds of its existence, according to the currently accepted scientific picture. After the Big Bang about 13.7 billion years ago, matter consisted mainly of quarks and gluons, two types of elementary particles whose interactions are governed by quantum chromodynamics (QCD), the theory of strong interaction. In the early universe, these particles moved (nearly) freely in a quark-gluon plasma. Then, in a phase transition, they combined and formed hadrons, among them the building blocks of atomic nuclei, protons and neutrons.
The experiments during 2018 at world-wide highest energy with the ALICE detector at the Large Hadron Collider (LHC) at the research center CERN produce matter where particles and anti-particles coexist, with very high accuracy, in equal amounts, similar to the conditions in the early universe. The team confirms, with analysis of the experimental data, theoretical predictions that the phase transition between quark-gluon plasma and hadronic matter takes place at the temperature of 156 MeV. This temperature is 120,000 times higher than that in the interior of the sun.
While unfounded speculation has swirled around the Large Hadron Collider since two yellow dots on a screen signaled that protons had been activated in 2008, CERN has always maintained that the work carried out there is safe, stating that there’s nothing being done at the lab that nature hasn’t already “done many times over during the lifetime of the Earth and other astronomical bodies.”
The LHC has officially stated that the collider “has now run for eight years, searching for strangelets without any detection.”
“The second scary possibility is that the quarks would reassemble themselves into compressed objects called strangelets,” writes Rees. “That in itself would be harmless. However under some hypotheses a strangelet could, by contagion, convert anything else it encounters into a new form of matter, transforming the entire earth in a hyperdense sphere about one hundred meters across.”
Rees minimizes fears, observing that that “innovation is often hazardous,” but that “physicists should be circumspect about carrying out experiments that generate conditions with no precedent, even in the cosmos.”
Since its unveiling in 2008, the LHC has been the world center of particle physics research. In a tunnel 17 miles in circumference and more than 500 feet below the surface of the Swiss-French border, the LHC smashes subatomic particles at nearly the speed of light and has seen breakthrough discoveries including the Higgs boson. But fundamental questions about the makeup of our universe remain unanswered, and many of the proposed solutions lie beyond the reach of the current LHC.
A successor is needed and the broad consensus in the particle physics community that there will be only one successor to the LHC — and China is building it.
The Chinese supercollider, at 34 miles in circumference, would be double the size of the LHC, and would be located near the Chinese town of Qinhuangdao at the coastal end of another enormous project of the past, the Great Wall. The Chinese plan is not without its competitors. The other two proposals are Japan’s International Linear Collider, an electron-positron collider, and CERN’s Future Circular Collider, a proton-proton collider located in Europe. Breaking ground as early as 2021 and starting to take data by 2028, the Chinese behemoth aims to be in operation until 2055 and define the frontiers of particle physics for the next two generations.
The Daily Galaxy via University of Munster, Science Alerts, and Foreign Policy
- Animal-rights activists had warned that donkeys, particularly on the touristic island of Santorini, had been forced to work long hours, seven days a week while carrying “excessive” loads. They said that some animals had faced spinal injuries, as well as open wounds from ill-fitting saddles.
“It’s recommended that animals should carry no more than 20 percent of their own body weight,” a spokesman for the charity Help the Santorini Donkeys said, according to Yahoo7 News.
“The obese and overweight tourists, combined with the lack of shade and water, as well as the sheer heat and 568 cobbled steps, is what is causing such a problem,” the spokesman added.
Activists had even launched demonstrations in the Greek capital of Athens to raise awareness about the donkeys’ plight. Elisavet Chatzi, 45, who joined the movement, hailed the government’s decision as a positive step.
“It’s a very big step, I think all our hard work has paid off,” the activist told British newspaper Metro. “We have won our fight because of the international media attention on the topic. No one could ever believe that new regulations would be set.”
Donkeys are frequently used on Santorini and other Greek islands to transport tourists over hilly and steep terrain that cannot be traversed by vehicles. As tourism has reached a record high, with 32 million visitors expected this year, the animals have faced increasing pressure. Environmental activists have also warned about the broader implications of a growing number of tourists on the country’s picturesque islands.
- Most GitHub users - about 80% - come from outside of the United States. Not surprisingly, the individual country with the most contributors remains the United States, however China and India (numbers two and three, respectively) have increased their ranking, as has Brazil (now ranked seven, up from tenth in 2017.) Australia comes in as the 12th most number of contributors, ahead of Spain.
GitHub is seeing new growth among organisations in non-English speaking countries, with Saudi Arabia increasing by 2.2 times from 2017, followed by Nigeria (2.1x) and Egypt, Venezuela and Indonesia.
Similarly, the fastest increase in public and private repositories is in Algeria, with a 2.3 factor increase from 2017, followed by Hungary (2.1x) and Egypt (2x).
No matter where people live, or whether they worked on public, private or open source repositories, GitHub found a universal trend of developers actively working on their code until 10 pm.
While work continues on the weekend, globally a drop on private repositories occurs when the working weekends. Projects similarly scale back around major holidays.
People aside, the top 10 open source projects by the number of unique contributors has been revealed as:
Microsoft/vscode
facebook/react-native
tensorflow/tensorflow
angular/angular-cli
MicrosoftDocs/azure-docs
angular/angular
ansible/ansible
kubernetes/kubernetes
npm/npm
DefinitelyTyped/DefinitelyTyped
When it comes to the greatest number of contributions to open source projects, the results are something that was unthinkable when GitHub launched ten years ago:
Microsoft, with 7,700 contributions
Google - 5,500
Red Hat - 3,300
UC Berkeley - 2,700
Intel - 2,200
Language-wise, JavaScript remains the number one language, while TypeScript has jumped three spots from its number ten place in 2017, now the seventh most popular language on GitHub. It experienced this popularity across every region.
The top emoji, you ask? It turns out GitHubbers are an encouraging lot, with thumbsup the most-used emoji out of all reactions, followed by dada, heart, laugh and eventually thumbsdown and frown in fifth and sixth place. Contributors to Ruby offered the highest percentage of heart emojis.
Additionally, GitHub stated over five million vulnerability alerts were issued during 2018, and over 800,000 have been resolved.