000webhost

Web hosting

Wednesday, January 17, 2018

Download Website Document Crawler, WikiReader Replacement, and More

- my WikiReader recently developed a strange error whereby the screen just goes dark upon loading (doesn't feel like a hardware fault because the boot screen showing normal characters shows up first). Have been trying to figure out a solution or else an alternative in the meantime. Difficult though as documentation out there is minimal and the company has basically shut down? Have created a headless crawler to gather what I need online and provide similar functionality offline? You can download it here:
- details are as follows:
# My WikiReader recently developed a strange error whereby the screen 
# just goes dark upon loading (doesn't feel like a hardware fault 
# because the boot screen showing normal characters shows up first). 
# Have been trying to figure out a solution or else an alternative in 
# the meantime. Difficult though as documentation out there is minimal 
# and the company has basically shut down? Have created a headless 
# crawler to gather what I need online and provide similar functionality 
# offline? 
#
# It uses wkhtmltopdf which is a lot lighter then a lot of JavaScript
# (and other) based options PDF distiller options out there. A sample file
# is obviously included. It was created as follows:
#
# As this is the very first version of the program it may be VERY buggy. 
# Please test prior to deployment in a production environment.
- here are some alternative options I've been looking at but if you look closer all are pretty 'heavy'. We're talking multi-gigabyte downloads based on what I've seen... 
alternative to minipedia
wikipedia offline android
http://www.forensicswiki.org/wiki/File_Format_Identification
- as I said previously, been looking for an offline version of Wikipedia for a while...
https://stackoverflow.com/questions/2683506/wikipedia-text-download
https://dumps.wikimedia.org/enwiki/
https://dumps.wikimedia.org/dvd.html
https://digiwonk.gadgethacks.com/how-to/download-complete-offline-version-wikipedia-you-can-read-anytime-0140655/
http://meta.wikimedia.org/wiki/Data_dump_torrents
download wikipedia only text older version
https://stackoverflow.com/questions/2683506/wikipedia-text-download
https://en.wikipedia.org/wiki/Wikipedia:Database_download
https://stats.wikimedia.org/EN/ChartsWikipediaZZ.htm
http://wiki-as-ebook.sourceforge.net/
https://github.com/WikiTeam/wikiteam/blob/master/wikipediadownloader.py
https://www.mediawiki.org/wiki/Download
http://www.kiwix.org/downloads/
https://wikisource.org/wiki/Main_Page
wikipedia download text only database
https://stackoverflow.com/questions/2683506/wikipedia-text-download
old version wikipedia database
https://en.wikipedia.org/wiki/Wikipedia_talk:Database_download/Archive_1
https://en.wikipedia.org/wiki/Wikipedia:Database_download
https://dumps.wikimedia.org/enwiki/latest/
https://web.archive.org/web/*/https://dumps.wikimedia.org/enwiki/
- main problem is that they are 'huge' dumps. We're talking 14GB for Wikipedia...
https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
https://dumps.wikimedia.org/
http://dumps.wikimedia.your.org/archive/
http://dumps.wikimedia.your.org/archive/2001/2001-02-19/wikipedia-2001-02-19.tar.bz2
https://dumps.wikimedia.org/backup-index.html
- thankfully, there are seems to be multiple kinds of dumps out there based on content that you may want?
https://dumps.wikimedia.org/
https://dumps.wikimedia.org/backup-index.html
- got not idea what this is...
http://dumps.wikimedia.your.org/archive/2002/2002-04-18/wikipedia_dump_20020321.sql.gz
http://dumps.wikimedia.your.org/archive/2003/2003-01-11/
- this is one of the core file formats that offline options seem to use...
http://www.openzim.org/wiki/OpenZIM
http://www.openzim.org/wiki/ZIM_File_Archive
http://www.openzim.org/wiki/ZIM_Readers
https://github.com/kiwix/
- I've been looking at old dumps (because thye're smaller). Can't get old torrents because too big and lack people to seed
http://kopiwiki.dsd.sztaki.hu/
- have been thinking about indexing my local system and browsing history and then building/downloading appropriate articles for offline use as well? I know how slow and ineffecient (even custom) crawlers can be though?
https://en.wikipedia.org/w/index.php?search=joint+strike+fighter&title=Special%3ASearch&fulltext=Search
http://epicroadtrips.us/2003/summer/nola/nola_offsite/FQ_en.wikipedia.org/en.wikipedia.org/wiki/Wikipedia_Database_download.html
- this is the obvious dump of Wikipedia that you want if you want it offline but don't want to spend heaps of time downloading. It's effectively a small subset of Wikipedia
https://simple.wikipedia.org/wiki/Simple_English_Wikipedia
https://en.wikipedia.org/wiki/Simple_English_Wikipedia
https://dumps.wikimedia.org/simplewiki/20180101/
https://dumps.wikimedia.org/simplewiki/
- there are some people who have been working on updates but finding them has been difficult
wikireader update download
reset wikireader
old dump wikipedia
- I obviously work on Mobile Applications from time to time as well so perhaps one day I'll create something more integrated?

Random Stuff:
- as usual thanks to all of the individuals and groups who purchase and use my goods and services
- latest in science and technology
- latest in finance and politics
- latest in defense and intelligence
- latest in animal news
- latest in music and entertainment

Random Quotes:
- Tape superconductors are a complex composite which include a layer of high-temperature material and a number of thin (about several nanometers) interlayers covering a flexible metal base. All layers are covered with silver and copper coating. The transport properties of such composites – the ability to transport electric currents with no energy dissipation (due to zero resistance) – vary significantly along the conductor because the techniques used to produce them are very complicated. This is why noncontact control over transport properties of long (over 100 meters) superconducting tapes is needed.
- NEW YORK (GenomeWeb) – A team led by researchers at the Salk Institute has developed a CRISPR-Cas9 genome editing system that epigenetically activates target genes without causing DNA double-strand breaks (DSBs). As they reported today in Cell, the investigators used the new system to treat diabetes, acute kidney disease, and muscular dystrophy in mouse models.

Current genome editing systems generally rely on the creation of DSBs, but this can limit their utility in treating disease because of the creation of off-target effects. Some researchers have tried repurposing the CRISPR-Cas9 system to create a dead Cas9 (dCas9)-VP64 system that enables target gene activation, which tends to allow the regulation of gene expression without the creation of DSBs, the authors noted. But implementing this system in vivo usually requires multiple single-guide RNAs (sgRNAs), making it less efficient, and exceeding the capacity of most common viral vectors, which are used to deliver these systems to living cells.
- They face the same struggles in life. They worry about their children. They want a decent career and enough money or privileges to have a decent life. They fall in love, they fight, they divorce. They don't think much about politics. They just want a peaceful life.

Of course, the environment in which they evolve is very different and requires a certain behaviour dictated by self-preservation.
...
I spent years with some of my managers and I have never been introduced to their families, except on one occasion.

One thing I hate is the constant monitoring we face. In our compound, there are guards every 20 or 30 metres. One of them even has a view to my living room.

The latest sanctions are putting all businesses in danger. Sanctions are not the answer. A strong middle class is a far more powerful tool for change and positive growth than anything else. Change from within is the only change that leaves no scars.
...
North Korean people are aware of their unique propaganda. When they're watching something on television and government propaganda comes on, they turn the TV off, much like our own behaviour at the beginning of a commercial break. That's pretty ironic, don't you think?

South Korean responses to North Koreans range from automatic hatred to a healthy curiosity, but hardly more than that. Young North Koreans find it difficult to integrate into South Korean society because they are so self-conscious about it. South Korea currently has a huge bullying problem.

I think North Korea wants to be isolated. It is in their best political interests to remain as a dangerous and non-negotiable state. It is totally fair that North Korea should have just as much independence and leverage as the US does, but that's not what other people in this world believe, unfortunately.

The US should keep out of Korean affairs and let grassroots regime change happen naturally as it did in South Korea. 
...
In my opinion, the sanctions approved by the UN, supported by China, aggravate the problem, instead of solving it. It reveals a certain hypocrisy of countries that are part of the UN Security Council since they all have atomic bombs. From my understanding, so-called "nuclear disarmament" is a consolidation of the monopoly of nuclear weapons in the hands of a small group of countries.

I do not feel threatened by the missile tests because I know the North Koreans do not intend to attack anyone. 
- Sustainment Efforts Already Falling Short and Grounding Aircraft

DOD and its international partners have more than 250 F-35 aircraft in their possession and plan to triple the fleet in the next 4 years. But DOD is already facing sustainment challenges that are degrading the fleet’s readiness for action, including:

    A 6-year delay in establishing F-35 part repair capabilities at military depots, resulting in long repair times;
    Parts shortages that kept aircraft from flying about 22% of the time from January through  August 7, 2017;
    The inability to conduct all required maintenance during initial planned F-35 shipboard deployments.

As this graphic shows, average repair times have not matched program objectives.
- You hear it in the way people talk — “The Arabs,” “The Jews” — about people with whom they have been sentenced to share a tiny patch of soil atop a ridge with no strategic value, over which the world has been battling for thousands of years, and negotiating on and off for decades, with no end in sight.

The world knows Jerusalem by the Old City and its Golden Dome, its ancient wall from the time of Herod, its Holy Sepulcher, its rough-hewed stones flattered by brilliant sunlight.

But Jerusalem is not just its postcard vistas. A pilgrimage is not the same as living here. The day-in, day-out friction can be draining. And when the conflict bubbles up, even natives can question why they persist.
- “This is a rigged system," said the Republican president, who won in 2016 despite gaining more than three million fewer votes than his contender Hillary Clinton.

Republican vs. Democrat: No difference

According to Becker, the West Coast correspondent of the A.N.S.W.E.R coalition, this is not a  Trump statement and it actually reflects “the reality” of the country.

“The big banks, the military industrial corporations, oil companies and other resource-based companies constitute the core of power inside of the United States,” he said.

There is practically no difference between a Democrat or a Republican to run the White House, he further suggested, adding that in either case, “we have huge military budgets in the United States, bigger than the rest of the world’s military budgets combined” that serve the true powers in the country’s political system.
- US President Donald Trump spends up to eight hours per day watching television, including mainstream news networks such as CNN, Fox and MSNBC, a new report has revealed.

Citing people close to Trump, major US-based daily The New York Times reported Saturday that the American president sits in front of a TV at least four hours and up to eight hours each day.

According to the report, the only person allowed to touch the remote control for the White House television is Trump himself and the technical support staff at his official residence in Washington.

However, during his trip to Asia last month Trump told reporters aboard Air Force One that he does not watch much television because he is too busy “reading documents.”

“Believe it or not, even when I’m in Washington or New York, I do not watch much television,” Trump said. "People that don’t know me, they like to say I watch television — people with fake sources. You know, fake reporters, fake sources.”

“But I don’t get to watch much television, primarily because of documents. I’m reading documents a lot,” Trump further insisted. “I actually read much more — I read you people much more than I watch television.”

This is while the American president had previously emphasized that he has “very little time for watching TV,” despite often tweeting about Fox News and CNN segments immediately after they air live.
- Indeed, a lot of people have forgotten that Russia also asked to join NATO (under both Gorbachev and Putin) and was rebuffed. Russia wanted IN the club. In fact, it was pathetic how much they wanted in the club and I thought so at the time. The West, steeped in Russophobia, was never going to let them in, and the Russians wasted somewhere between fifteen and twenty years before they got the message that the West, and the US in particular, was implacably hostile and intended to remain so.

Russia has about half the GDP of California. It is a superpower only in terms of nuclear weapons, though its army, technology, and geographic reach mean that it is still a great power.

It has been pushed into the arms of the Chinese, which from a geopolitical POV is ludicrous: Siberia is the most likely point of conflict between the Chinese and the Russians, driven by the realities of climate change, demographics, and aquifer depletion. Siberia has hardly any population, lots of land, and tons of water, and in twenty to thirty years, the Chinese are going to need that water, and in forty years or so they’re going to need the land.

It would have been easy to spin Russia in and make it the Eastern end of West. Instead, it has been made into a foe, and if the hysterics looking for someone to blame for their own electoral failures have their way, made into an actual enemy. (An enemy with enough nukes to destroy the entire world more than once. Sanity suggests picking better enemies.)

Whether or not the majority of Americans “want” this, emergent American behaviour shows this to be the path the US and the West are on.

Those of us who would prefer the world to survive might wish otherwise, but in-group thinking and the death wish are stronger than sanity or reason.

Putin is a result of shock doctrine, imposed by the West. Russian animosity is largely a result of Western actions that the Russians could not but interpret as hostile (including the color revolutions and the Western-backed Ukrainian coup.)

If, at this point, the Russians are trying to return the interference (and they probably are, just not nearly to the extent or effect the propaganda suggests) that is only what is to be expected, and Americans crying about a little bit of interference look ludicrous, given the US’s record of backing actual coups and constant, regular interference in virtually everyone else’s elections.

If you don’t want enemies, don’t treat them like your enemies. If you do, don’t be surprised when they act like your enemies.

And for God’s sake, Democrats, stop blaming Russia for an electoral failure that was primarily your own damn fault. Look to what you can control–your own behaviour–rather than seeking a scapegoat.
- FinFisher is surveillance software marketed by Lench IT Solutions, which sells it through law enforcement channels. It is "weaponised German surveillance malware used by intelligence agencies around the world to spy on journalists, political dissidents and others", according to WikiLeaks which revealed details of the spyware in 2014.

The new surveillance software, StrongPity2, had been visible in two unnamed countries since 8 October.

It was "using the same (and very uncommon) structure of HTTP redirects to achieve 'on-the-fly' browser redirection, only this time distributing StrongPity2 instead of FinFisher," Kafka said.

"We analysed the new spyware and immediately noticed several similarities to malware allegedly operated by the StrongPity group in the past."

Other common software packages which are being trojanised by StrongPity2 are CCleaner, Driver Booster, Opera, and WinRAR 5.50.
- Tax Justice Network Australia report author Jason Ward said: “What this research shows is that ExxonMobil has exploited Australia’s natural resources, made a ton of money and siphoned it all off overseas. By using notorious tax havens, high-interest internal loans and related party transactions they’ve sucked the taxpayer dry.

“ExxonMobil has $54 billion sitting in offshore bank accounts. Our research shows that much of that money has been funnelled from Australia through Exxon’s Dutch outfit, and ultimately through their Bahamas subsidiary. But the very idea of Exxon Australia being owned in the Bahamas raises more questions than answers.”

Ward said Exxon had misled the Senate Inquiry into Corporate Tax Avoidance in 2015 by failing to declare the company's Dutch-Bahamas structure.
- Washington: President Donald Trump directed the National Aeronautics and Space Administration to send American astronauts back to the moon and eventually to Mars, shifting the agency's mission from the study of Earth.

"This is a giant step toward that inspiring future and toward reclaiming America's proud destiny in space," Trump said on Monday at a White House ceremony, where he signed the new NASA directive. "And space has to do with so many other applications, including a military application. So we are the leader and we're going to stay the leader and we're going to increase it many fold."

Saving Money (without Sacrificing), Random Stuff, and More

- use price matching when you can to get an extra discount. Note, a lot of companies advertise low prices just to get you through the door. ...