
Web hosting

Thursday, January 2, 2020

Finding Patterns in Data, Genetic Data Sources, and More

- like I said previously I'm interested in finding patterns in data for various different reasons. I've been thinking of algorithms to look for patterns in data for various projects. It's not that easy to come up with a general solution that can be applied across the board, is easy to implement/understand, is quick, etc?
The primary research areas include platforms to improve prediction and understanding of complex data platforms to enable trustworthy inferences and risk-based decisions, and data systems to enable ethical, robust and scalable AI.
- pattern recognition isn't easy especially if the amount of data you're dealing with is larger. There aren't too many widely available software options which can recognise new patterns for arbitrary sets of data? Those examples which do exist can only do so for a limited/narrow set of data? For other cases, you need to narrow down the search space by reducing the total number of patterns that you're looking for? Hint, catch up combinatorics and permutation theory?
- you may need to catch up on your data science as it's commonly used across the AI/ML space
data science free certificate
- cache/store memories/branching in order to increase speed? The more you understand how the brain works and how AI/ML works the more you realise how big of a technological gap there is? Over ride old memories you don't need
- twist/wrangling data into forms that are easier to work with? Change from column based systems to rows? Seems to be a common problem across many data mining style technologies? This will be the most important part for analysing non-human created information?
find pattern in csv file linux
- it's obvious that correlation and causation aren't the same. For some people figuring out a way to figure out when/if their is causation is easy. If you have date based data you can offset two fields of data and then look for correlations between their movement to see if their is causation? The wider the offset figure the greater the lead time?
offset csv fields
offset csv fields linux
- I looked around but there aren't too genuine pattern finding algorithms out there. The most common occurence seems to be that you specify patterns before looking for something. Else, you look for common attributes and then categorise based on this data. Computationally this style of pattern finding is obviously very expensive
scripts to find patterns in csv
Best way to find patterns in csv file? - Perl Monks
sudo apt-get install xlsx2csv visidata catdoc csvtool weka weka-doc littler pspp csvkit 
find pattern csv
find pattern csv
- as I indicated in other posts I believe that pattern recognition algorithms aren't that much different from one another which makes me skeptical of some scientists who say the (for wider band AI/ML) algorithms may not be interchangeable? It's obvious a lot of them may have gone for money rather then trying to maximise using across the board?
codes which have never been cracked
seti algorithms
- at it's core I think a lot of categorisation algorithms work on the basis of summarisation and comparison? This can involve reduction of core elements into a smaller number of attributes
- one example of this is plotting and comparison of what plots look like?
linux graph plot cli
- another is trying to come up with a line of best fit or formula which best represents the data in question and comparing them?
The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. A t-test is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. 
Analysis of variance is a collection of statistical models and their associated estimation procedures used to analyze the differences among group means in a sample. ANOVA was developed by statistician and evolutionary biologist Ronald Fisher.
In statistics, linear regression is a linear approach to modeling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.
covariance two fields linux
find correlation fields linux
- often you can't get data which lines up? The best you can do is join relevant data and check for correlations?
- it's only when you understand what people may be doing that you understand the type of computing power that is required. The more data that needs to be scanned the more computing power needs to be applied? This means the more capital you have the more likely you can build a better operation. Not the best situation in the world? Basically, it's a lot of simple arithmetical operations we're dealing with. Floating point operation benchmarks are probably the best method that you can guage potential hardware performance
hedge fund computer systems
cern computer systems
nasa computer systems
- if you understand this area you you'll realise the need to shave off every single second that you can for data processing operations due to the number of operations you need to run. Imagine an inefficient operation run 10 times vs 10000 (or more) times. You can save huge amounts of time just by optimising a single operation which is likely to be repeated
- I consider this area to be pattern recognition rather then genuine AI/ML that many others think of it. There are some free AI/ML courses out there but not many that include simultaneous certification. Once you get access to data you need to be able to generate theories (through your own mind or via AI) and get it to think and check feedback. From this data you can extrapolate theories? Most theories rely on basic statistical and probabalistic theories
- Rapidminer does a lot of what I want but is not free. psppire seem to be faster then Excel and comes with much of the core functionality that I want out of the box in an easy to use GUI
using rapidminer
free alternative rapidminer
- the most basic comparison would be frequency counts of particular symbols in a file? The problem with this is that if you use varying window sizes and have larger files this can take a lot of time as indicated by my work on Bible codes. Does it contain same symbols for given sample? Are they in repeating patterns?
- another comparison would be of slope/gradient
get gradient of series of numbers linux
- another comparison would be change of relative direction
- another comparison would be a change in spread
- another comparison would be change in divisor
In linear algebra, an eigenvector or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue is the factor by which the eigenvector is scaled.
- another comparison would be bounded change of formulaic relationship
- another comparison would be distribution of values
- another comparison would be of plots and compare using picture recognition style technologies
- another comparison would be periodic vs random movements
- another comparison would be line of best fit. Linear, quadratic, cubic, etc? This will obviously require more computing power though
 another comparison would be frequency checks of symbolic patterns
- another comparison would be reduction of data and comparison of that. These techniques are commonly used most forms of pre-packaged data analysis software and is often purely statistical
- another comparison is trying to find patterns of behaviour in data you already/know have and then trying to fit it on to data that you're analysing
- shapes/lines of best fit is another option
- for a given number of symbols how often are there repeating symbols? Is there entropy? I actually thought about this but realised there isn't much of a difference bewteen reducton over two sets of numbers and comparison
In statistical mechanics, entropy is an extensive property of a thermodynamic system. It is closely related to the number Ω of microscopic configurations that are consistent with the macroscopic quantities that characterize the system. 
Applied mathematics is the application of mathematical methods by different fields such as science, engineering, business, computer science, and industry. Thus, applied mathematics is a combination of mathematical science and specialized knowledge. The term "applied mathematics" also describes the professional specialty in which mathematicians work on practical problems by formulating and studying mathematical models.
In the past, practical applications have motivated the development of mathematical theories, which then became the subject of study in pure mathematics where abstract concepts are studied for their own sake. The activity of applied mathematics is thus intimately connected with research in pure mathematics.
- in reality, building something like Google/Bing is not as difficult as you think? The main problem is access to resources so that it can acesss more data and scale
- part of me knows/guesses that it's possible to create a sort of AI/ML capability that would create more knowledge over time if you imbue it with basic characteristics such as infinite curiosity, an ability to befriend others via basic social skills, basic emotions, etc... It would incorporate a crawler, pattern recognition capability, some randomness, etc... It would still be primitive but a more true representation of AI/ML?
Over the last 25 years, psychologists have found that personalities coalesce around five basic traits, dubbed the Big Five. Everyone can be described as having varying levels of agreeableness, conscientiousness, neuroticism, extroversion and openness to experience.
- sometimes you're better off using language specific tools (such as pip) then the local package manager
sudo pip install csvkit
alternative csvkit
- you can often get free alternatives for anything you may want
sas free alternative
rapidminer alternative
rapidminer free
easy statistics software foss
free forecasting csv files
csv analysis
- some things such as bar graphs are easier in R then GNUplot
r scripts
r scripts download
gnuplot bar graph
- it's not that easy to simply parallelise or optimise operations that aren't designed that way from the start
split script linux cli cpu
- when does a trend become sustained? Depends on whether it's done this before? Check pattern frequency in relation to past?
- finding points of resistence via frequency distribution?
technical trading
hidden pivot technical analysis
Keiser Report - Dead Stocks Walking (E1375)
- ironically, whether the application and use of data is time critical or not it feels like speed is a general problem across or not?
- at the end of the day for quantitative data you'll realise you're searching mostly for numbered patterns as in the battle between qualitative vs quantitative analysis the latter is normally easier to deal with

- just like Jim Simons I'm sceptical of both Creationism and Evolution theories in their entirety (more on this in another post). That said, there just seems to be many examples of what feel like design decisions in bio-organic organisms. For instance, modulation, discrete/unique/multiple signalling systems, redundancy, etc... It wouldn't surprise me if various code breakers haven't had had a try at running pattern recognition algorithms over genetic sequences? It also wouldn't surprise me if some of these firms were covers for various other enterprises (the US in particular seems to mix up the military industrial complex and civilian operations a lot?)?
- I've come to realise that a lot of experiments on animals and humans aren't actually required. Understanding how bio-organic organisms are built means you can infer a lot of what will happen. This will require more work to see how far I can push the idea though?
download human genome
Question: Downloading the latest human genome
copy human genome
- the strange thing about the human genome is that there hasn't been proper/complete mapping as yet? The other issue with looking for patterns is that even if you understand the underlying code you still need to program computers to know what to look for. It's a difficult/time consuming task for your average home computer?
genome chromosome map
The Human Genome Project, Goals Of HGP, Construction of Genetic Map
What insiders know, however, is not well-understood by the rest of us, who take for granted that each A, T, C, and G that makes up the DNA of all 23 pairs of human chromosomes has been completely worked out. When scientists finished the first draft of the human genome, in 2001, and again when they had the final version in 2003, no one lied, exactly. FAQs from the National Institutes of Health refer to the sequence’s “essential completion,” and to the question, “Is the human genome completely sequenced?” they answer, “Yes,” with the caveat — that it’s “as complete as it can be” given available technology.
Perhaps nobody paid much attention because the missing sequences didn’t seem to matter. But now it appears they may play a role in conditions such as cancer and autism.
“A lot of people in the 1980s and 1990s [when the Human Genome Project was getting started] thought of these regions as nonfunctional,” said Karen Miga, a molecular biologist at the University of California, Santa Cruz. “But that’s no longer the case.” Some of them, called satellite regions, misbehave in some forms of cancer, she said, “so something is going on in these regions that’s important.”
Miga regards them as the explorer Livingstone did Africa — terra incognita whose inaccessibility seems like a personal affront. Sequencing the unsequenced, she said, “is the last frontier for human genetics and genomics.”
- my suspicion is that I'll only be able to look at part of a genetic sequence (I just don't have the computer power, time, resources, etc...). There are many problems with this but it's better the nothing?
The Human Genome Project was a vast long-running and internationally collaborative project to determine the sequence of nucleotide base pairs that make up human DNA, and identify and map all the genes of the human genome, both physically and functionally. In fact, it is the world’s largest collaborative biological project across all of history.
However, it’s massive. Who’d have thought humans are so complex? With a genome of seven billion DNA base pairs, it takes 100GB to store the unique genetic sequence for any individual human being as a string of text using the letters A, T, C and G that refer to the bases – adenine, thymine, cytosine and guanine.
download genetic sequence
download genetic sequence txt
Question: How To Download Full Genome Sequence
- I believe in a more better, automated, less antagonistic political process and part of it involves super computing trawling publicly available information. The current vision for the future of Big Technology seems to be around the clock, centralised surveillance of even private data to provide this. It's obviously customised and invidualised but it can cause psychological problems in people though and exacerbate problems in others. The debate over AI/MLs future direction seems to playing out all over the place at the moment. I wonder whether it's possible to build inexpensive AI/ML based de-centralised, limited online or offline "friends" that can watch over people at all times? I suspect it's possible but there's no impetus yet? I'm fairly certain that the most search engines are biased. Results are being deliberately biased/filtered for political, geo-political, personal reasons, commercial, intelligence/defense/security, etc... If you understand how search technology works and compare the results between various search engines the results seem "off"?
We Need to Talk About Search
The RNC has also invested $200 million in digital, tech, and data programs since 2013. Our data analytics program allows us to identify voter behavior and predict results with an unprecedented level of accuracy. We’ve modeled 72 billion voter predictions to date, and this is a key part of what helped us win special elections last year in Georgia, Montana, South Carolina, Kansas and Utah.
evangelion supercomputer
The S.C. Magi System (マギ) are a trio of supercomputers designed by Dr. Naoko Akagi during her research into bio-computers while at Gehirn.[1] The Magi's 7th generation organic computers were implanted with three differing aspects of Dr. Naoko Akagi's personality using the Personality Transplant OS (Operating System). The same system is used to operate the Evangelions.[2] The Magi are used during episode #13 to connect the simulation bodies to the Evas in their cages and, presumably, the synch-tests when the pilots are not present in the actual Evas are also conducted through the Magi.
The three Magi run Nerv Headquarters and the municipal government of Japan by majority decision.
The three original Magi in Nerv Headquarters collectively form the set Magi 01:
Magi-1: Melchior: Dr. Naoko Akagi as a scientist;
Magi-2: Balthasar: Dr. Naoko Akagi as a mother;
Magi-3: Casper: Dr. Naoko Akagi as a woman.
- I've been half playing with the idea of building something to check for potential censorship. It's difficult to know the difference between flawed algorithms and censorship and bias though? I've done some basic checks and it feels like they censor stuff about the US as well stuff for allies as well?
google censorship
Google and its subsidiary companies, such as YouTube, have removed or omitted information from its services to comply with its company policies, legal demands, and government censorship laws.[1] Google's censorship varies between countries and their regulations, and ranges from advertisements to speeches. Over the years, the search engine's censorship policies and targets have also differed, and have been the source of internet censorship debates.[2]
Numerous governments have asked Google to censor what they publish. In 2012, Google ruled in favor of more than half of the requests they received via court orders and phone calls. This did not include China and Iran who had blocked their site entirely.[3]
On July 25, 2019, Presidential hopeful Tulsi Gabbard sued Google for blocking her ads after the presidential debate when she became one of the most searched terms on the search engine following the debate. [4] Later, according to conservative comedian Steven Crowder, Google again changed search results, apparently only in the United States, to block users from seeing pertinent Gabbard content after she once again became a high-trending topic after Hilary Clinton implied she was a Russian asset. [5]
- work continues on building my medical search engine (lots of components need to be built. Parts that I'll use for other projects as well). Many things have become obvious. Current search engine (if you understand how data mining, AI/ML, natural language/image/video processing, medical research, etc... works) is limited and if humans were built then the Creator/Creators were vastly more advanced then humans currently are. If you sit there and try to build a biological organism from scratch from a single cell and double helix strand it becomes much more obvious what large expanses of supposedly non-coding sections may be for (it's obvious that there is a lot of error checking systems, noop/padding styles codes, codes associated with structure, and other codes/systems that seem to be in place on first inspection?). I studied biology a long time ago and have revisted the subject and the more you dig the more it feels like humans were built? In another post I built a a humanoid type organism from a single template cell. I'll try to build one from scratch using a single cell and double helix strange in another post? The main problems is that so much needs to be build from scratch. I tried looking for the equivalent of biological compilers (convert programming logic into actual genetic code) but these seem to be very new so I may need to build something of my own or else use a pseuodo-DNA like coding system? If humans were built it'll become become really obvious through various structures that are built into their makeup
biological compilers
genetics algebra
- big data, machine learning, artificial intelligence and pattern finding software seems to be the keywords here. None of it seems to fit the bill though? World is dynamic so you constantly have to change your algorithms?
monte carlo simulator

Random Stuff:
- as usual thanks to all of the individuals and groups who purchase and use my goods and services
- latest in science and technology
- latest in finance and politics
- latest in defense and intelligence
- latest in animal news
- latest in music and entertainment

Random Quotes:
- Cyber attackers stole data from 29 million Facebook accounts using an automated program that moved from one friend to the next, the social media giant has revealed.
Key points:
Facebook says the users hacked are from a "fairly broad" number of countries
The cyber attack started small and spread through "friends of friends"
Facebook has a website to check if your account was breached
But the company said that was less than the 50 million profiles it initially reported after investigators reviewed activity on accounts that may have been affected.
- There are currently three crew members aboard the ISS: NASA astronaut Serena M. Auñón-Chancellor, the European Space Agency’s Alexander Gerst, and cosmonaut Sergey Prokopyev. The trio is currently scheduled to return home in December, and they’re well supplied with food and water, said Kenny Todd, NASA’s ISS operations integration manager, at a press briefing held yesterday.
Their mission could be extended, however, as the Soyuz spacecraft can last in orbit for 200 days, expiring in early January. So if the mission is extended, it’ll only be by a few weeks.
An empty space station would be regrettable. There would be no one on board to monitor and conduct the many scientific experiments currently underway on the $US100 billion ($140 billion) outpost. The good news is that the ISS can be kept operational by ground controllers, as Todd explained during the briefing.
“I feel very confident that we could fly for a significant amount of time [without a crew],” he said. Should the “pumps do their job, and all the other systems — the [solar] arrays to continue to rotate, and we keep the batteries charged — there’s nothing that says we can’t continue... [with a] minimal amount of commanding.”
- Commander Alexander Gerst was ready to welcome two new astronauts to the International Space Station (ISS) — but ended up looking on helplessly as a catastrophic rocket failure sent the incoming crew falling back to Earth.
Key points:
The current astronauts on the ISS may need to extend their six-month mission
It is unknown whether Russia will be able to send replacements to ISS in time
NASA is looking at the potential of running the ISS without a crew
In a series of photos, Mr Gerst captured the moment a Russian Soyuz rocket malfunctioned at the start of what should have been a routine six-hour flight to deliver two astronauts to the ISS.
The failure of the booster rocket, just two minutes after the launch and at an altitude of 50 kilometres, activated an emergency rescue system which sent the capsule carrying US astronaut Nick Hague and Russian cosmonaut Alexei Ovchinin into a dangerous ballistic descent.
Footage showed the pair shaking around in the capsule, enduring gravitational forces of six to seven times more than is felt on Earth as they came down at a sharper-than-normal angle.
About 30 minutes later the capsule parachuted onto a barren area of steppe in Kazakhstan.
"Glad our friends are fine," Mr Gerst, a European Space Agency astronaut from Germany, tweeted from orbit.
"Spaceflight is hard. And we must keep trying for the benefit of humankind."
- Ma did not say when the Jack Ma Institute of Entrepreneurs would launch, but said the aim was to train 1,000 tech leaders a year over the next 10 years.
“We’re giving a lot of opportunities for young Indonesian people to learn,” Ma told reporters after meeting Indonesian ministers on the sidelines of the International Monetary Fund and World Bank meetings being hosted by Indonesia.
The co-founder of Alibaba, China’s biggest e-commerce firm, said it is important for Indonesia to invest in human capital because “only when people improve, when people’s minds change, when people’s skills improve, then we can enter the digital period”.
Indonesia has a shortage of trained engineers in technology and the institute will also train hundreds of developers and engineers on cloud computing to help make Indonesian businesses more digital-savvy.
The country is a key market for Alibaba, whose cloud computing arm Alibaba Cloud launched a data center in Indonesia in March.
Ma said his company would continue to invest “not only on e-commerce, but also cloud computing, logistics and...infrastructure” in Indonesia, while also helping local businesses to grow.
Indonesian Communications Minister Rudiantara told Reuters in September that Indonesia was partnering with Ma to look into ways to harness Alibaba’s businesses to increase its exports, particularly to China.
McKinsey estimated in a report released on Aug. 30 that the value of Indonesia’s e-commerce market will surge to at least $55 billion (£42 billion) by 2022 from $8 billion in 2017.
On Friday, Ma told a panel discussion at the IMF and World Bank meetings that “the internet is designed for developing countries”, with “great opportunities in Africa” also.

3D Printing Background, Random Stuff, and More

- in this post we'll be looking at 3d printers. I basically wanted/needed one to study/research, help fix some stuff around the home, bu...