I am going to do my best to divide this in different topics, keep it short and non-technical. It is going to be challenging.
GNU/Linux is definitively the way to go. Personally I don’t trust Microsoft nor Apple. In general I am an advocate of open source.
Personally I have been using GNU/Linux since around 2000, and I am extremely comfortable with it. I understand that it could be a challenge where we still don’t have alternatives (or high quality tools) to some specific software like CAD. But at the same time Linux have better tools in many other aspects.
There are different “distros” of GNU/Linux. A distro (or distribution) is the mixture of the tools from GNU, the Linux kernel and a way to manage packages. This is a very simplistic definition, but I don’t want to get too technical here.
Distributions for the beginners:
Distributions for more mature users:
Between IPhone and Android, I will always chose Android. The reason being that Android has a hybrid open/close source development model. It is possible to get the Vanilla version of Android without google play (actually nothing from google) and then you can install an alternative to google play like F-Droid or Aurora Store.
I don’t completely trust Aurora Store because it is a front end of Google Play itself, and software could have backdoors but I still think it could be a better alternative than have a phone that you paid for but Google owns.
Just like GNU/Linux, there are several distros. These can com “with gapps” or “without gapps”. Gapps being “Google Apps”, in other words, Google Play, Gmail, Google maps, etc. Without gapps is the way I prefer to go personally.
There are several custom roms like:
The most secure focus are Graphene and Calyx. Again, privacy and security is a deep rabbit hole. As deep as you can afford to go.
Personally I have been using Lineage for years and some custom mods. Every single Android device that I owned, I flashed and removed the original Android and replace it with a custom rom without Google Apps (nothing from Google). I have been doing this since around 2006. I am very curious to try Calyx though. If I would be starting from scratch, I would go for Calyx.
I have a Mediatek chipset on my phone and I hated it! That was definetively a bad purchase for this purposes. Don’t get me wrong, the phone is great, but it is not easier to flash a custom ROM. I would go for Qualcomm chipset instead and make sure that the bootloader can be changed. This Ulefone Armor 23 looks promising. Apparently the 24 will also come with a Qualcom chipset.
When in doubt, use a VPN or TOR circuits. In this way your traffic remains more secure. You can get a router that has that capability already built in, so you can route everything through that device.
Always use a firewall and disable the services you don’t use. Use strong passwords.
If you are looking for an appliance that can do all of that you will need DD-WRT, pfSense or similar if you want a DIY approach and you are techie enough or go for a similar commercial solution like the BraxRouter (I am not affiliated in any way, and I haven’t tested it personally so be cautious and do your own research - DYOR).
I prefer avoiding using SMS authentication, I don’t trust the telephony system. They have too much power and knowledge about their users.
I also avoid the Google prompts. Those who appears when you are trying to login on your phone while you are trying to login on your computer. The reason is simple, they tie your identity on the computer with your phone, because you will have to “click” on your phone. All traffic will be related.
I don’t use Google Authenticator nor Authy, or any Android app for that matter unless it is opensource. Personally I use one on my computer, this brings extra security and inconveniences. If I am not with my computer, I cannot login anywhere, even with my phone. I decided to live with that inconvenience.
Use 2FA (Second Factor Authentication) whenever possible, but with OTP (One TIme Password) for that. I prefer free TOTP software, the caveat to this approach is that you will have to handle your secrets yourself in a secure way, so you don’t lose them and nobody else has access to them. Please don’t even think about putting that into Dropbox!
I am a fairly technical guy so I prefer to keep the secrets under my control having encrypted backups than “trusting” any other party.
I am not going to disclose publicly here what I use for security reasons. But if you know me personally and have questions ask me one on one and I will share more about this.
If you want more information, do your own research. I can point you to this video, as a quick overview. I think the title is a bit missleading though, I trust 2FA with TOTP where the secrets are encrypted and properly backed up.
Signal or no signal… that is the question…
I have been a huge fan of Telegram. At the beginning, you did not needed to use a phone number, it was just an account you created (and you could create as many as you wanted) and that was it. No verification and no ties to your identity. Sadly, now they “require” a phone number, which is by design IMO. So I no longer trust Telegram. The same argument goes for Signal.
I am aware of Signal’s history, even before it was signal and the original developers, but again… I don’t trust supplying my phone number to any app for any reason. There is no need for that, none.
I have always hated Whatsapp, security is crappy, it belongs to Meta. Huge user base… So many reasons that I wont even go there…
My preferred method for IM is XMPP which is federated in nature, so “no central points” (again, loosely said). I would prefer to pick up a server that is not crowded, it is in a country outside of the 5 eyes 1.
In terms of privacy, OMEMO would be preferred, or asymmetric cryptography with GPG/PGP, or OTR (Off The Record) to say the least.
This is a topic that I struggled with for some time.
Bottom line is I don’t trust large providers, because they are an easy target. Out
I wrote a note, which is not publicly available analyzing the email providers by April of 2022. Before that I was using a really old google apps account that I created around 2005. It was very convenient, I am not going to lie, but I never felt good about it so it was good that Google decided to kick me out by charging me. If I have to pay, I prefer to pay for something more friendly towards privacy than to them.
Here is a table with a summary of some of the criteria that I used to analyze those providers.
Provider | Domains | IMAP | Location | EAR | Crypto pay | App | Alias | Storage | Price/m | 2FA | Cons |
---|---|---|---|---|---|---|---|---|---|---|---|
CounterMail | 15$ | yes | Sweden | yes | BTC | no | inf | 4G | ~5 | yes | sweden |
Mailbox.org | 50? | yes | Germany | yes | no? | no | 25/50 | 10G | 3 | yes | meta exposed, tracking |
Runbox - Mini | 5 | yes | Norway | no | BTC | no | 100 | 10G | 2.91 | no EAR | |
Protonmail - Plus/Pro | 10/*1* | bridge | Switzerland | yes | yes | 5 | 5G/5G | 8/5 | visibility, suspicious | ||
Mailfence - Pro/Entry | 5/1 | yes | Belgium | no | BTC/ LTC | yes? | 50/10 | 20G/5G | 7.5/2.5 | not EAR | |
Posteo+ | 0 | yes | Germany | yes | no | ? | ? | no domains | |||
Ctemplar - Knight | 5 | no | Iceland | yes | XMR /BTC | yes? | 30 | 10G | 12 | yes | no IMAP |
Tutanota | no | Germany | yes | no | yes | own encryption, no IMAP | |||||
Fastmail Standard | 100 | Australia | no? | 30G | 5 |
Personally I liked Ctemplar, but I wanted IMAP since I use Emacs/mu4e for email management and the possibility of keeping a local copy of my emails if needed. I also use my phone to read emails when needed. So I had to discard them. Depending on your use you may not need IMAP.
Honestly I am still not comfortable with my email provider, but I am better than at Google’s.
In regards to usage, GPG is always preferred but of course it depends on both parties using it. Be mindful that RSA 2048 is no longer secure, governments can break into it. Probably not regular people. So I would go for 4096 bit keys.
I am not going to disclose publicly here what I use for security reasons. But if you know me personally and have questions ask me one on one and I will share more about this. I know that if you are a technical person you can find this yourself.
I don’t trust Zoom. Do your own research here. One of the founders was an ex-Cisco systems. Again… do your own research.
I would go for Jitsi or something is open source. Of course a case could be done that there is always a server as middle man, to which yes, I agree. At this point I don’t know of any better alternatives, Jitsi is the way I would go or Signal’s video chat feature. Even though I don’t trust much Signal either.
I don’t use Google Chrome, even less Microsoft Edge. I prefer the old Firefox.
As part of my setup I don’t save cookies when I close the browser nor history. I always prefer private mode when possible and I use a bunch of extensions to make my unique fingerprint not so obvious.
I avoid using google whenever possible. I don’t trust much Duckduck go either. Search engines are the gate keepers of the internet, sadly there is no integrity in this business. I have read good things about Qwant.
I would advice to also disable Javascript by default by using No Script or similar. As using other extensions to make your tracking less obvious, even though I don’t think there is much to do here. Try to always use VPNs or TOR circuits. Be aware that having a high level of privacy/security could make your browsing experience miserable!
Well, this is going to be controversial. We don’t use money, we use fiat currency which is controlled by the governments and manipulated via inflation and taxes as they want. It is hard to get out of that. Cash is always preferred but it is not convenient, so balancing things out is up to the reader.
Precious metals could be an alternative but probably everything is going to get digitized in the future, even precious metals probably with certificates on the blockchain as Colombia is already issuing for real estate. So I don’t have good news here.
Probably having some money in a private crypto currency like ZCash, Monero, Verge or others could be a good idea. Be mindful of the fluctuations and the risks about it as the tax compliance.
Also be mindful that you should aim for a peer to peer market and that could be dangerous also, so you will have to “trust” the network somehow. If you use an exchange it defeats the purpose of privacy.
If you go for a private crypto, it is highly likely that you can run a local node. You can use a small device for such like a Raspberry Pi or a refurbished mini-desktop or notebook. You will have to setup the service and get a full copy of the block chain.
Honestly I would hedge for land, food security and water rather than save money. But that is just my opinion.
This is going to be controversial also. I am still learning about radio but I would definitively have a Ham radio and a rapport team, otherwise the radio alone is useless. Community is extremely important.
I prefer not to touch much on this subject. Do your own research. Maybe the Ghost network could help you. Check out this video on the Lilygo T-Deck device with the Meshtastic software for Encrypted Comms. And this video on Meshtastic and LoRa devices for general knowledge. I am fairly new to this, so do your own research. Not having to use a phone would be preferred IMO, so it is not tied to the IMEI number, MAC address, IP address and so forth.
Remember, favor VPNs or TOR circuits, handle your secrets yourself in a secure and reliable way.
Don’t trust services where you need to supply personal data. Prefer services that offer alternative payment methods in crypto also.
As I said in the beginning, privacy and security is a rabbit hole that can go very deep, even for the technical guy. So I am trying to just give an overview.
If there is a category that I forgot about that you would like to see here, send me a message.
For more on these “eyes” I suggest you to read: “Permanent record” by Edward Snowden and/or “No place to hide” by Gleen Greenwald. ↩︎
Long story short, the journey of trying to learn Hebrew started probably between 2006 to 2008 and I haven’t had much success at it. Or lets say that the expectations don’t match reality due to lack of consistent effort on my side.
I thought that I could try to somehow start with the most common words in the scriptures and that is what motivated to parse the Aleppo codex and analyze its content in python. I quickly learned that my approach was rather naive given the prefixes that causes variations to the words, even thought the meaning of the word itself is the same.
This blog post is a second attempt to tackle the same problem but with a more sophisticated and accurate approach using Text Fabric.
A corpus of ancient texts and (linguistic) annotations represents a large body of knowledge.
Text-Fabric is a Python package for processing and access a corpus of ancient text and linguistic annotations. In this specific case I am using the Hebrew Bible Database, containing the text of the Hebrew Bible augmented with linguistic annotations compiled by the Eep Talstra Centre for Bible and Computer from the VU University Amsterdam.
The text is based on the Biblia Hebraica Stuttgartensia edited by Karl Elliger and Wilhelm Rudolph, Fifth Revised Edition, edited by Adrian Schenker, © 1977 and 1997 Deutsche Bibelgesellschaft, Stuttgart.
The text-fabric version has been prepared by Dirk Roorda Data Archiving and Networked Services, with thanks to Martijn Naaijer, Cody Kingham, and Constantijn Sikkel.
It is amazing to see the work these researchers did compiling all this data and making it public for free. I am very thankful for it.
I am using a literate programming approach with Doom Emacs and org-mode. This blog post is a bit more technical oriented. So it is okay if you pass through the code or if you don’t understand some of it.
This snippet will create a virtual environment and install some libs. Due to some incompatibilities with the word cloud lib, I had to downgrate do Python 3.6.
#virtualenv ~/.workon-home/venv-textfabric
virtualenv --python=/usr/bin/python3.6 ~/.workon-home/venv-textfabric
cd ~/.workon-home/venv-textfabric
source ./bin/activate.fish
pip install text-fabric pandas requests
Use C-c
here on Doom Emacs
(pyvenv-activate "~/.workon-home/venv-textfabric")
Use run-python
and ober-eval-block-in-repl
to evaluate each block (C-c r
at the time being). For the sake of a more practical approach to writing using
literate programming the output of the blocks will follow the code. The
documentation of the BHSA dataset can be found here.
from tf.app import use
import os
import collections
from itertools import chain
A = use("ETCBC/bhsa", hoist=globals())
A.indent(reset=True)
A.info("counting objects ...")
for otype in F.otype.all:
i = 0
A.indent(level=1, reset=True)
for n in F.otype.s(otype):
i += 1
A.info("{:>7} {}s".format(i, otype))
A.indent(level=0)
A.info("Done")
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
This is Text-Fabric 9.5.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html
122 features found and 0 ignored
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
0.00s counting objects ...
| 0.00s 39 books
| 0.00s 929 chapters
| 0.00s 9230 lexs
| 0.00s 23213 verses
| 0.00s 45179 half_verses
| 0.00s 63717 sentences
| 0.00s 64514 sentence_atoms
| 0.00s 88131 clauses
| 0.00s 90704 clause_atoms
| 0.01s 253203 phrases
| 0.01s 267532 phrase_atoms
| 0.01s 113850 subphrases
| 0.02s 426590 words
0.07s Done
Text-Fabric allows to query the “types” of words. We can do this by using:
print(F.sp.freqList())
(('subs', 125583), ('verb', 75451), ('prep', 73298), ('conj', 62737), ('nmpr', 35607), ('art', 30387), ('adjv', 10141), ('nega', 6059), ('prps', 5035), ('advb', 4603), ('prde', 2678), ('intj', 1912), ('inrg', 1303), ('prin', 1026))
These functions will facilitate the processing of the data. The first one will
get the occurrences of a specific type of lexeme. I am not a linguist so I am
learning on the go and I will try to explain it easily. The most common lexeme
verb is אמר
(say). It appears 5307 times. We can see that Yah indeed wants to
communicate and instruct us.
A specific instance of that verb has other properties (node features) like morphology which contains the verbal stem (qal, piel, nif, hif), the verbal tense (perf, impf, wayq) and gender (m, f) among other information. A lexeme is the representation of that word (verb in this case) in the broad aspect, i.e regardless of the verbal stem, tense, gender, etc.
import pandas as pd
def get_lexeme_by_type(lex_type):
rows = []
A.indent(reset=True)
for w in F.otype.s("lex"):
if F.sp.v(w) != lex_type:
continue
row = {'w': w,
'freq_lex': F.freq_lex.v(w),
#'sp': F.sp.v(w),
'lex_utf8': F.lex_utf8.v(w),
'gloss': F.gloss.v(w),
#'phono': F.phono.v(w),
#'g_word_utf8': F.g_word_utf8.v(w),
#'g_lex_utf8': F.g_lex_utf8.v(w),
#'g_cons_utf8': F.g_cons_utf8.v(w),
#'gn': F.gn.v(w),
#'nu': F.nu.v(w),
#'ps': F.ps.v(w),
#'st': F.st.v(w),
#'vs': F.vs.v(w),
#'vt': F.vt.v(w),
#'book': F.book.v(w),
}
rows.append(row)
df = pd.DataFrame(rows)
return(df)
def export_df_to_org_table(input_df, rows_qty):
input_df["lex_utf8"] = input_df.apply(lambda x: remove_diacritics(x["lex_utf8"]), axis=1)
output_df = input_df.head(rows_qty)
return([list(output_df)] + [None] + output_df.values.tolist())
df_preps = get_lexeme_by_type('prep')
df_cloud = df_preps.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437629 | 20069 | ל | to |
1437602 | 15542 | ב | in |
1437606 | 10987 | את | <object marker> |
1437639 | 7562 | מן | from |
1437615 | 5766 | על | upon |
1437644 | 5517 | אל | to |
1437693 | 2902 | כ | as |
1437843 | 1263 | עד | unto |
1437805 | 1049 | עם | with |
1437865 | 878 | את | together with |
1443534 | 378 | ל | to |
1443536 | 345 | די | <relative> |
1438236 | 272 | למען | because of |
1445292 | 226 | ב | in |
1438491 | 142 | כמו | like |
1443543 | 119 | מן | from |
1445269 | 104 | על | upon |
1443531 | 63 | כ | like |
1445265 | 35 | עד | until |
1445281 | 22 | עם | with |
1438337 | 17 | בלעדי | without |
1443082 | 9 | במו | in |
1444752 | 4 | למו | to |
1445487 | 1 | ית | <nota accusativi> |
1445919 | 1 | לות | with |
Fair enough, this is a good starting point to learn some words.
Now lets see the 30 most cited names in scriptures
df_nmpr = get_lexeme_by_type('nmpr')
df_cloud = df_nmpr.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 30)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437714 | 6828 | יהוה | YHWH |
1438941 | 2506 | ישראל | Israel |
1441856 | 1075 | דוד | David |
1438822 | 819 | יהודה | Judah |
1439439 | 766 | משה | Moses |
1438103 | 681 | מצרים | Egypt |
1441150 | 643 | ירושלם | Jerusalem |
1438343 | 438 | אדני | Lord |
1439060 | 406 | שאול | Saul |
1438702 | 349 | יעקב | Jacob |
1439473 | 347 | אהרן | Aaron |
1442014 | 293 | שלמה | Solomon |
1438115 | 262 | בבל | Babel |
1439714 | 218 | יהושע | Joshua |
1438843 | 213 | יוסף | Joseph |
1438262 | 182 | ירדן | Jordan |
1439203 | 180 | אפרים | Ephraim |
1438512 | 180 | מואב | Moab |
1438408 | 175 | אברהם | Abraham |
1439001 | 166 | בנימן | Benjamin |
1442009 | 154 | ציון | Zion |
1437762 | 151 | אשור | Asshur |
1438157 | 149 | ארם | Aram |
1439200 | 146 | מנשה | Manasseh |
1441940 | 145 | יואב | Joab |
1440774 | 140 | שמואל | Samuel |
1438890 | 134 | גלעד | Gilead |
1442693 | 129 | ירמיהו | Jeremiah |
1442394 | 109 | שמרון | Samaria |
1441986 | 109 | אבשלום | Absalom |
And now lets generate a cloud image for in the Hebrew
And also a cloud image for the English translations of those names
To me this is very exciting, with just some lines of code (and of course standing in the shoulders of giants that made TF possible and the BHSA dataset), we are able to perform simple tasks like this.
I want more, so lets repeat the same procedure but with verbs now. And lets increase the amount of verbs to 100 verbs sorted by frequency.
df_verbs = get_lexeme_by_type('verb')
df_cloud = df_verbs.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437621 | 5307 | אמר | say |
1437611 | 3561 | היה | be |
1437637 | 2629 | עשה | make |
1437773 | 2570 | בוא | come |
1437669 | 2010 | נתן | give |
1437760 | 1547 | הלך | walk |
1437623 | 1298 | ראה | see |
1437812 | 1159 | שמע | hear |
1438048 | 1138 | דבר | speak |
1437900 | 1082 | ישב | sit |
1437657 | 1068 | יצא | go out |
1437844 | 1038 | שוב | return |
1437764 | 965 | לקח | take |
1437799 | 944 | ידע | know |
1437724 | 890 | עלה | ascend |
1437851 | 847 | שלח | send |
1437769 | 835 | מות | die |
1437768 | 810 | אכל | eat |
1437628 | 732 | קרא | call |
1437894 | 653 | נשא | lift |
1437884 | 629 | קום | arise |
1437736 | 583 | שים | put |
1438023 | 548 | עבר | pass |
1438442 | 521 | עמד | stand |
1437899 | 500 | נכה | strike |
1437767 | 494 | צוה | command |
1437835 | 492 | ילד | bear |
1437766 | 468 | שמר | keep |
1437775 | 453 | מצא | find |
1437776 | 434 | נפל | fall |
1438193 | 377 | ירד | descend |
1437782 | 376 | בנה | build |
1437819 | 370 | נגד | report |
1439047 | 347 | מלך | be king |
1437816 | 332 | ירא | fear |
1437682 | 327 | ברך | bless |
1438467 | 316 | ענה | answer |
1438536 | 303 | פקד | miss |
1438046 | 297 | סור | turn aside |
1437685 | 292 | מלא | be full |
1438495 | 290 | חזק | be strong |
1437722 | 288 | עבד | work, serve |
1438067 | 288 | כרת | cut |
1437853 | 283 | חיה | be alive |
1438581 | 283 | איב | be hostile |
1438231 | 281 | קרב | approach |
1438522 | 237 | חטא | miss |
1438022 | 231 | זכר | remember |
1438347 | 231 | ירש | trample down |
1438906 | 225 | בקש | seek |
1437684 | 224 | רבה | be many |
1439718 | 223 | כתב | write |
1438074 | 217 | שתה | drink |
1439180 | 216 | כון | be firm |
1437788 | 214 | עזב | leave |
1438225 | 214 | נטה | extend |
1438878 | 213 | נצל | deliver |
1438479 | 212 | שכב | lie down |
1437866 | 212 | יסף | add |
1438565 | 211 | אהב | love |
1437706 | 206 | כלה | be complete |
1439443 | 205 | ישע | help |
1438002 | 204 | אסף | gather |
1438392 | 202 | שפט | judge |
1438251 | 193 | יכל | be able |
1438014 | 188 | רום | be high |
1438077 | 188 | גלה | uncover |
1439563 | 186 | אבד | perish |
1438554 | 185 | שבע | swear |
1438643 | 171 | שאל | ask |
1437711 | 171 | קדש | be holy |
1439420 | 171 | לחם | fight |
1438427 | 170 | חוה | bow down |
1439181 | 170 | בין | understand |
1437962 | 169 | בחר | examine |
1437869 | 168 | רעה | pasture |
1437885 | 167 | הרג | kill |
1438064 | 164 | דרש | inquire |
1438964 | 162 | טמא | be unclean |
1437750 | 161 | סבב | turn |
1438306 | 159 | נוס | flee |
1439475 | 154 | שמח | rejoice |
1438016 | 152 | כסה | cover |
1437797 | 150 | נגע | touch |
1438486 | 147 | שבר | break |
1438182 | 146 | נסע | pull out |
1438240 | 146 | הלל | praise |
1438652 | 146 | שנא | hate |
1437981 | 145 | שחת | destroy |
1438318 | 143 | רדף | pursue |
1438727 | 143 | חנה | encamp |
1437765 | 141 | נוח | settle |
1438323 | 140 | קרא | encounter |
1438461 | 135 | פנה | turn |
1438010 | 135 | פתח | open |
1437937 | 134 | חלל | defile |
1438923 | 134 | זבח | slaughter |
1438020 | 133 | שאר | remain |
1438373 | 133 | קבר | bury |
1437856 | 129 | שכן | dwell |
Here we can see the id of the lexeme (w), the frequency (freq_lex), the utf8 representation of the word without vowel pointing and the english meaning used for the translation. All automatically, pretty cool isn’t it?
Now lets create word clouds out of those verbs.
First, the top 30 verbs in Hebrew:
Now lets try the 50 most common verbs in Hebrew:
And now lets increase the amount to 100 verbs in Hebrew:
Now lets generate a cloud words image for the top 30 verbs in English:
Similarly, for the 50 top verbs in English:
And finally, the top 100 top verbs in English:
I have noticed something “unexpected” here with vowel pointing1. The word
“Turn” is the bigger one in english, yet is not the most frequent word in the
table. Why is that? Although I am not completely sure about it, I believe it is
due that it appears twice on the list since there are two Hebrew words
translated as turn with different amount of frequencies. I have not taken that
into account in the code. This doesn’t happen with the Hebrew words since the
frequency would already be increased in the freq_lex
column.
To solve this, I would probably have to parse the pandas dataframe to manually sum the frequencies of the same english word. But I won’t do that right now since I am more curious about exploring other words.
Furthermore, the “mistake” is rather interesting and I believe it speaks volumes in itself, since I truly believe that the will of the heart of our Father goes towards teshuva indeed, turn of our own ways. Seeing this unexpected event was rather interesting to me.
df = get_lexeme_by_type('subs')
df_cloud = df.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437679 | 5412 | כל | whole |
1437836 | 4937 | בן | son |
1437605 | 2601 | אלהים | god(s) |
1438272 | 2523 | מלך | king |
1437610 | 2504 | ארץ | earth |
1437630 | 2304 | יום | day |
1437787 | 2186 | איש | man |
1437616 | 2127 | פנה | face |
1437987 | 2063 | בית | house |
1438194 | 1866 | עם | people |
1437852 | 1618 | יד | hand |
1438181 | 1441 | דבר | word |
1437789 | 1217 | אב | father |
1437903 | 1090 | עיר | town |
1437634 | 970 | אחד | one |
1437801 | 887 | עין | eye |
1437662 | 876 | שנה | year |
1437747 | 864 | שם | name |
1438084 | 800 | עבד | servant |
1437721 | 788 | אין | <NEG> |
1437783 | 781 | אשה | woman |
1437664 | 768 | שנים | two |
1437674 | 754 | נפש | soul |
1438327 | 750 | כהן | priest |
1437942 | 715 | אחר | after |
1437861 | 706 | דרך | way |
1437867 | 629 | אח | brother |
1437940 | 602 | שלש | three |
1437973 | 601 | לב | heart |
1437746 | 599 | ראש | head |
1437944 | 588 | בת | daughter |
1437620 | 582 | מים | water |
1437941 | 579 | מאה | hundred |
1438017 | 558 | הר | mountain |
1438101 | 555 | גוי | people |
1437691 | 553 | אדם | human, mankind |
1437946 | 506 | חמש | five |
1437640 | 505 | תחת | under part |
1437813 | 505 | קול | sound |
1437889 | 498 | פה | mouth |
1438530 | 492 | אלף | thousand |
1437897 | 490 | שבע | seven |
1437932 | 490 | עוד | duration |
1437707 | 486 | צבא | service |
1439456 | 469 | קדש | holiness |
1437745 | 454 | ארבע | four |
1437854 | 438 | עולם | eternity |
1438456 | 422 | משפט | justice |
1437608 | 421 | שמים | heavens |
1438238 | 421 | שר | chief |
1437636 | 418 | תוך | midst |
1437859 | 412 | חרב | dagger |
1437627 | 407 | בין | interval |
1438247 | 403 | כסף | silver |
1437645 | 401 | מקום | place |
1438050 | 401 | מזבח | altar |
1437648 | 396 | ים | sea |
1437752 | 389 | זהב | gold |
1437618 | 378 | רוח | wind |
1438381 | 378 | אש | fire |
1438578 | 376 | נאם | speech |
1438471 | 374 | שער | gate |
1437886 | 360 | דם | blood |
1437913 | 345 | אהל | tent |
1438610 | 336 | סביב | surrounding |
1438449 | 335 | אדון | lord |
1437654 | 330 | עץ | tree |
1438645 | 325 | כלי | tool |
1437716 | 320 | שדה | open field |
1437965 | 315 | עשרים | twenty |
1438523 | 315 | נביא | prophet |
1437970 | 313 | רעה | evil |
1438278 | 308 | מלחמה | war |
1437704 | 300 | מאד | might |
1437842 | 298 | לחם | bread |
1438040 | 297 | עת | time |
1437882 | 293 | חטאת | sin |
1438051 | 286 | עלה | burnt-offering |
1438001 | 284 | ברית | covenant |
1438005 | 284 | חדש | month |
1437729 | 277 | אף | nose |
1438239 | 274 | פרעה | pharaoh |
1437951 | 274 | שש | six |
1437870 | 274 | צאן | cattle |
1437755 | 273 | אבן | stone |
1438299 | 270 | מדבר | desert |
1437781 | 270 | בשר | flesh |
1439127 | 252 | מטה | staff |
1438519 | 252 | לבב | heart |
1437990 | 247 | אמה | cubit |
1438038 | 247 | רגל | foot |
1438498 | 245 | חסד | loyalty |
1438977 | 244 | חיל | power |
1438338 | 240 | נער | boy |
1438145 | 240 | גבול | boundary |
1438372 | 237 | שלום | peace |
1438328 | 235 | אל | god |
1437956 | 235 | מעשה | deed |
1437893 | 231 | עון | sin |
1437653 | 229 | זרע | seed |
df = get_lexeme_by_type('conj')
df_cloud = df.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437609 | 50272 | ו | and |
1437638 | 5500 | אשר | <relative> |
1437624 | 4483 | כי | that |
1437878 | 1068 | אם | if |
1443538 | 731 | ו | and |
1438644 | 320 | או | or |
1437964 | 138 | ש | <relative> |
1437798 | 133 | פן | lest |
1438420 | 22 | לו | if only |
1445238 | 16 | הן | if |
1439657 | 15 | זו | <relative> |
1438910 | 14 | לולא | unless |
1445279 | 7 | להן | but |
1445073 | 2 | אלו | if |
1441905 | 1 | אין | if |
df = get_lexeme_by_type('adjv')
df_cloud = df.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437665 | 526 | גדול | great |
1437703 | 469 | טוב | good |
1437969 | 424 | רב | much |
1437742 | 347 | רע | evil |
1439474 | 292 | לוי | Levite |
1438130 | 288 | פלשתי | Philistine |
1438464 | 262 | רשע | guilty |
1437675 | 239 | חי | alive |
1437978 | 206 | צדיק | just |
1438044 | 182 | ראשון | first |
1438444 | 181 | זקן | old |
1437934 | 166 | אחר | other |
1437642 | 156 | שני | second |
1439171 | 138 | חכם | wise |
1439676 | 119 | ישר | right |
1439730 | 116 | קדוש | holy |
1437708 | 98 | שביעי | seventh |
1438003 | 96 | טהר | pure |
1438975 | 94 | חלל | pierced |
1437979 | 91 | תמים | complete |
1440077 | 88 | טמא | unclean |
1438135 | 87 | אמרי | Amorite |
1438568 | 85 | רחוק | remote |
1437658 | 83 | שלישי | third |
1442586 | 82 | יהודי | Jewish |
1437697 | 82 | זכר | male |
1438499 | 76 | קרוב | near |
1439788 | 75 | עני | humble |
1438144 | 73 | כנעני | Canaanite |
1444427 | 70 | כסיל | insolent |
1439969 | 70 | זר | strange |
1442699 | 64 | כשדי | Chaldean |
1439798 | 61 | אביון | poor |
1439463 | 56 | חזק | strong |
1437671 | 56 | רביעי | fourth |
1437667 | 54 | קטן | small |
1438329 | 53 | עליון | upper |
1439418 | 52 | חדש | new |
1438949 | 51 | אחרון | at the back |
1438574 | 49 | ירא | afraid |
1439174 | 48 | דל | poor |
1438386 | 48 | חתי | Hittite |
1438083 | 47 | קטן | small |
1438883 | 45 | נכרי | foreign |
1438233 | 43 | יפה | beautiful |
1438641 | 43 | נקי | innocent |
1438230 | 41 | כבד | heavy |
1438134 | 41 | יבוסי | Jebusite |
1438018 | 37 | גבה | high |
1438770 | 37 | מר | bitter |
1439209 | 36 | קשה | hard |
1441758 | 35 | יקר | rare |
1438416 | 35 | ערל | uncircumcised |
1438309 | 34 | עברי | Hebrew |
1437686 | 34 | חמישי | fifth |
1439960 | 33 | ימני | right-hand |
1441098 | 32 | חסיד | loyal |
1442307 | 32 | פנימי | inner |
1440562 | 31 | צר | narrow |
1438455 | 31 | עצום | mighty |
1439795 | 31 | שמיני | eighth |
1438235 | 30 | מצרי | Egyptian |
1438855 | 29 | לבן | white |
1438030 | 29 | עשירי | tenth |
1438376 | 28 | שלם | complete |
1437705 | 28 | ששי | sixth |
1439654 | 27 | אדיר | mighty |
1440033 | 27 | נדיב | willing |
1439212 | 26 | כן | correct |
1443017 | 26 | אויל | foolish |
1439472 | 26 | עור | blind |
1442311 | 25 | חיצון | external |
1440433 | 25 | כשי | Ethiopian |
1440474 | 25 | בצור | fortified |
1438137 | 25 | חוי | Hivite |
1438253 | 23 | פרזי | Perizzite |
1439979 | 23 | עשיר | rich |
1438509 | 22 | צעיר | little |
1439345 | 22 | עז | strong |
1438606 | 21 | מלא | full |
1440818 | 21 | עמוני | Ammonite |
1440434 | 21 | ענו | humble |
1440919 | 21 | שמח | joyful |
1438972 | 21 | רחב | wide |
1442916 | 20 | עריץ | ruthless |
1441742 | 20 | רעב | hungry |
1438266 | 19 | חטא | sinful |
1445272 | 19 | רב | great |
1440889 | 19 | רענן | luxuriant |
1437997 | 19 | תחתי | lower |
1440295 | 18 | תשיעי | ninth |
1439610 | 18 | שכיר | hired |
1441042 | 18 | נבל | stupid |
1440613 | 18 | ראובני | Reubenite |
1440162 | 17 | עמק | deep |
1439751 | 17 | חפשי | free |
1438706 | 17 | עיף | faint |
1440167 | 17 | שפל | low |
1441912 | 16 | חסר | lacking |
1441561 | 16 | אביר | strong |
df = get_lexeme_by_type('prps')
df_cloud = df.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437749 | 1394 | הוא | he |
1437998 | 874 | אני | i |
1437820 | 747 | אתה | you |
1437753 | 485 | היא | she |
1437817 | 359 | אנכי | i |
1437967 | 291 | המה | they |
1438066 | 283 | אתם | you |
1437807 | 269 | הם | they |
1438256 | 120 | אנחנו | we |
1438234 | 57 | את | you |
1437961 | 48 | הנה | they |
1445253 | 16 | אנה | I |
1445354 | 15 | אנתה | you |
1445320 | 15 | הוא | he |
1445913 | 9 | המו | they |
1445260 | 7 | היא | she |
1439211 | 5 | נחנו | we |
1438874 | 4 | אתנה | you |
1445496 | 4 | אנחנא | we |
1445384 | 3 | המון | they |
1445421 | 3 | אנון | they |
1445255 | 1 | אנתון | you |
1443994 | 1 | אתן | you |
1445760 | 1 | אנין | they |
df = get_lexeme_by_type('advb')
df_cloud = df.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437737 | 834 | שם | there |
1437804 | 769 | גם | even |
1438351 | 577 | כה | thus |
1437641 | 546 | כן | thus |
1437850 | 433 | עתה | now |
1437896 | 200 | לכן | therefore |
1438021 | 161 | אך | only |
1437936 | 141 | אז | then |
1437796 | 133 | אף | even |
1437974 | 109 | רק | only |
1438250 | 96 | יחדו | together |
1438489 | 82 | פה | here |
1445293 | 57 | אדין | then |
1438375 | 50 | הנה | here |
1439618 | 50 | יומם | by day |
1438389 | 45 | אולי | perhaps |
1439587 | 37 | ככה | thus |
1438807 | 32 | חנם | in vain |
1440400 | 25 | פתאם | suddenly |
1438796 | 19 | אולם | but |
1438485 | 16 | הלאה | further |
1438912 | 16 | ריקם | with empty hands |
1440112 | 13 | פנימה | within |
1445334 | 13 | כען | now |
1438403 | 12 | הלם | hither |
1442642 | 9 | אמנם | really |
1445336 | 8 | כן | thus |
1445965 | 7 | אספרנא | exactly |
1438081 | 7 | אחרנית | backwards |
1445901 | 5 | כנמא | thus |
1438451 | 5 | אמנם | really |
1445348 | 5 | ברם | but |
1445714 | 4 | אף | also |
1445973 | 4 | תמה | there |
1445250 | 3 | להן | therefore |
1445917 | 3 | כענת | now |
1443334 | 3 | דומם | silently |
1444590 | 3 | אזי | then |
1445237 | 2 | אזדא | publicly known |
1444976 | 2 | להן | therefore |
1438528 | 2 | אמנה | indeed |
1445939 | 1 | כעת | now |
1445703 | 1 | טות | fastingly |
1445771 | 1 | כה | here |
1442542 | 1 | מסח | alternatively |
1445670 | 1 | עלא | above |
1445537 | 1 | אחרין | at last |
1445492 | 1 | צדא | really |
1445251 | 1 | תנינות | again |
1445069 | 1 | עדן | hitherto |
1445068 | 1 | עדנה | hitherto |
1444323 | 1 | קדרנית | mourningly |
1446033 | 1 | אדרזדא | with zeal |
df = get_lexeme_by_type('prde')
df_cloud = df.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437938 | 1177 | זה | this |
1437712 | 746 | אלה | these |
1437784 | 604 | זאת | this |
1443532 | 58 | דנה | this |
1445488 | 14 | אלך | these |
1441693 | 11 | זה | this |
1438482 | 9 | אל | these |
1445926 | 7 | דך | that |
1441583 | 7 | לז | this there |
1445595 | 6 | דא | this |
1445971 | 6 | דך | that |
1445409 | 5 | אלין | these |
1445363 | 3 | דכן | that |
1444075 | 2 | זו | this |
1438655 | 2 | לזה | this there |
1443546 | 1 | אלה | these |
1445970 | 1 | אל | these |
1443998 | 1 | לזו | this there |
df = get_lexeme_by_type('inrg')
df_cloud = df.sort_values(by=['freq_lex'], ascending=False)
print_df = export_df_to_org_table(df_cloud, 100)
w | freq_lex | lex_utf8 | gloss |
---|---|---|---|
1437821 | 743 | ה | <interrogative> |
1437877 | 178 | למה | why |
1438738 | 72 | מדוע | why |
1438719 | 61 | איך | how |
1438443 | 45 | איה | where |
1438846 | 43 | מתי | when |
1438396 | 41 | אן | whither |
1437815 | 39 | אי | where |
1440801 | 17 | איכה | how |
1438803 | 17 | אין | whence |
1439081 | 10 | איפה | where |
1445345 | 6 | ה | <interrogative> |
1445029 | 4 | איככה | how |
1444096 | 3 | אהי | where |
1445070 | 2 | אי | how |
1445785 | 2 | היך | how |
1441950 | 1 | אל | where |
1442508 | 1 | איכה | how |
: I will have to double check this. What I understood from the documentation of the API is that the lexemes do not have vowel pointing, however the shin/sin seems to have diacritics so I had to remove the diacritics of each word, because the word cloud was not properly rendered with diacritics. ↩︎
Now I am using org-roam as my notes management app. It is great and it works on top of org-mode and of course Emacs. Not everyone feels comfortable with this since it requires to be okay with technical bits and bytes. I just love it…
I started using v1.0 which did not supported to map org headings but only file top headers. This was brilliant already enough in itself. With v2.0 org-roam included the ability to map headers also. What was good became even better! The only caveat at that time is that org-roam-server stopped working. That is understandable. Now there is a new project called org-roam-ui which is basically org-roam-server for v2.0 with headings support.
Recently I trimmed my DB, I am trying to find a balance between using headings and not using headings. My Emacs default config used to creat UUIDs for each heading. This was good at first but I had to adapt this for org-roam v2.0… I had +9000 notes, which is unpractical to be honest… So I cleared the UUIDs of the headers of several files that were not needed and now I have +3000 notes. It is still too much IMO, so further trimming is needed but that cleanup needs more thought.
Now lets preview this thing… Most screenshots have been anonimized. At the time of this writing this is how my notes looks like. Keep in mind that out of all of my notes, only a few selected ones are shared here on my website.
I am going to show first a little video sample
Figure 1 is how the exobrain looks considering all notes
Figure 2 is how the exobrain looks only considering the file headers and subheadings.
Figure 2 shows my Emacs configuration file cluster and how it relates to other notes.
Figure 4 shows my parashot study notes.
Figure 5 shows my beekeeping notes and log, which is quite extensive by the way.
Figure 6 shows my testing trading notes. I developed my own tool to document my simulated tradings and tests.
Figure 6 shows my simulated trading log. I integrated my own python tool with yasnippets, so Emacs could get all the goodies out of it. The result is a very detailed (and possibly overkill) simulated trade log. This is going to probably be decoupled from my org-roam database. I don’t really need to search on all of that.
Figure 8 shows my research notes on automated trading strategies.
Figure 9 shows my research notes on agroforestry.
Figure 10 shows my journal of the current year 2022. TODO’s tasks and how they integrate with the rest of the notes.
Figure 11 shows my homesteading notes and all the ongoing projects.
]]>So I rolled my sleeves and I thought: I will google the Aleppo codex in plain text so I can download it and later use it for analysis. The problem is, I couldn’t find it…
The next challenge then is, how can I produce a plain text version of the Aleppo codex? I thought about parsing a mysword module, which are in reality a sqlite database so that should be easy… but then when I was doing something totally unrelated (probably around the bees or my first batch of mead) it hit me… I have several bibles from the Sword project already installed on my notebook and is all OSS, so… there is probably a python wrapper around diatheke or similar… and sure it is =)
If you want the technicalities, go read my note on How to parse the Aleppo codex and analyze its content in python.
But if you are not a geekie human, you probably are interested only in the results rather than the bits and bytes. This is why I splitted the whole thing in two notes, one more IT related and one more Hebrew related. So here we go, brace yourselves this is probably going to be long.
I am not sure on how to “slice” the codex so the chunks makes sense for an analysis, for instance: per chapter? Per book? Per “stories”? Or even other slicing like Torah, Neviim and Ketuvim. Since it is not clear I will start slicing by books at first.
Hebrew has some peculiarities one of them being the vowel pointing. This brings a ton of challenges. For the sake of simplicity I had to stick to a codex that does not include niqqud whatsoever. I am not sure if this is a good approach or not because two different words without niqqud can render the same writing yet have totally different meaning.
Another problem is that there are words that are trivially known like לא (h2834) or אל (h3882); or words that are not translated yet used very much like את (h7073). I have implemented a filter to have the possibility to skip these words. Not because they are not relevant, the Aleph-Tav has a ton of secrets and importance. But there are mixed together some irrelevant tokens considered as words, like opening and closing brackets and parenthesis or similar. So I implemented a list to ignore these words that are rather known or really irrelevant.
It was rather interesting to me how fast the repetitions decline on unfiltered words. For instance the most used word is the את (h854) with 7073 occurrences. Yet 30 words later in a row, there is a 90% decline in occurrences, משה (h) with 704 occurrences.
Word | Count | Strong | |
---|---|---|---|
את | 7073 | h854 | |
יהוה | 5611 | h3068 | |
אשר | 4629 | h834 | |
אל | 3882 | h410 | |
כי | 3553 | h3588 | |
על | 3140 | h5921 | |
לא | 2834 | h3809 | |
כל | 2757 | h5921 | |
ואת | 2190 | h854 | |
ישראל | 2085 | h3479 | |
ויאמר | 2043 | h559 | Related |
בני | 1650 | h1123 | |
בן | 1607 | h1121 | |
ולא | 1447 | h3809 | Related |
The words from above as the top 20, plus the following words
Word | Count | Strong | |
---|---|---|---|
לו | 1045 | h3863 | |
איש | 1027 | h376 | |
המלך | 1014 | h4428 | |
בית | 1003 | h1004 | |
מלך | 1000 | h4427 | |
הוא | 910 | h1931 | |
עד | 904 | h5704 | |
לאמר | 897 | h559 / 564 | |
לך | 871 | h | |
הארץ | 856 | h776 | |
ויהי | 808 | h | |
אמר | 797 | h559 | |
דבר | 787 | h1697 | |
העם | 724 | h5971 | Related |
וכל | 712 | h3606 | Related |
משה | 704 | h4872 | |
שם | 681 | h8043 | |
מן | 661 | h4478 | |
לי | 660 | h | |
הזה | 650 | h1957 | |
אני | 635 | h589 | |
יהודה | 632 | h3063 | |
לפני | 615 | h3942 | |
להם | 607 | h3859 | |
אם | 607 | h518 | |
אלהים | 597 | h430 | |
אדני | 587 | h136 | |
דוד | 583 | h1730 | |
אתה | 582 | h857 | |
עם | 582 | h5973 |
Word | Count | Strong |
---|---|---|
את | 2569 | h |
אשר | 1617 | h |
יהוה | 1493 | h |
אל | 1241 | h |
על | 949 | h |
כל | 921 | h |
כי | 895 | h |
לא | 861 | h |
ואת | 809 | h |
בני | 620 | h |
ויאמר | 619 | h |
משה | 598 | h |
ישראל | 508 | h |
ס | 491 | h |
הוא | 420 | h |
הארץ | 352 | h |
ולא | 345 | h |
לו | 330 | h |
Word | Count | Strong |
---|---|---|
את | 658 | h854 |
אשר | 351 | h834 |
ויאמר | 337 | h559 |
אל | 335 | h410 |
כי | 263 | h3588 |
ואת | 205 | h854 |
על | 203 | h5921 |
כל | 199 | h3605 |
אלהים | 150 | h430 |
יוסף | 144 | h3130 |
יעקב | 142 | h3290 |
יהוה | 141 | h3068 |
לא | 137 | h3809 |
הארץ | 126 | h776 |
לו | 126 | h3863 |
ויהי | 125 | h1961 |
בני | 118 | h1123 |
הוא | 110 | h1931 |
אברהם | 108 | h85 |
שנה | 102 | h8141 |
The more I try to know, the less I feel I know… This coding-linguistic area is fascinating and intreaguing. I feel like a taxonomy of this area is needed.
This will probaly take a lot of time, which I don’t have. So I want to try to balance between being effective and being efficient. I cannot afford to spend much time with this project but on the other hand the Hebrew language is really really apealing to me. I have yet to find a balance on how to proceed.
As part of documenting things I am going to include some very interesting resources that I have found.
From the time I spent researching I would probably go with the BHSA DB. This is a proposed roadmap:
Also, if focus is more in learning Hebrew, check the Parabible website which looks really lean and straigthforward.
Recently an upgrade on my notebook went bad and broke something. I wasn’t sure what went wrong but since I had just one system at that time, I had to figure out how to fix it with the help of my phone. Basically what I did is install EtchDroid to be able to download an iso and flash it to a flash drive on my phone directly. Later I could boot form that flash drive, mount the LUKS/LVM partitions and fix the bootloader. Here is a short how to.
#+begin_src sh
cryptsetup luksOpen /dev/disk/by-partlabel/cryptlvm lvm
mount /dev/storage/root /mnt mount /dev/storage/home /mnt/home
mount /dev/sda1 /mnt/boot
cd mnt mount -t proc /proc proc #mount -t sysfs sys sys #mount -o bind dev dev mount -o bind run run
arch-chroot /mnt
dhcpcd eth0 #+end_src sh
#+begin_src sh
GRUB_CMDLINE_LINUX=“cryptdevice=/dev/sda2:lvm” GRUB_PRELOAD_MODULES=“part_gpt part_msdos cryptodisk luks” #+end_src sh
#+begin_src sh
grub-mkconfig > /boot/grub/grub.cfg grub-install –efi-directory=/boot –target=x86_64-efi /dev/sda
mkinitcpio -p linux #+end_src sh
My first couple of blogs were on blogspot around 2005 or so, later I switched to dokuwiki, which is awesome but it didn’t integrated well on my workflow. So I tried some extensions to try to write directly in markdown but it didn’t work well for me. Then I moved to Hyde, which a Python static website generator. I invested quite some time customizing everything to my needs and while it was functional, I had to write in markdown (which I don’t like…). I tried to write in RST at that time without success.
Then I moved from Vim to Emacs and discovered org-mode… And yeah, everything changed… I love Emacs+Org-Mode and I used quite a lot (and I still use Vim occasionally).
I wanted to write directly in Org-Mode for my new blog to see if content flows this time (hopefully) and I can achieve the integration I was looking for all these years.
The version of this blog uses plain Org-Mode and Hugo with ox-hugo as a middle helper. So far, so good, I just migrated my old posts which are few in number.
What I am looking forward is to create a mix of a Blog and a digital garden for my notes (also known as second brain or exobrain).
The rest of the file is actually a sandbox of general Org-Mode formatting to check out how it renders on Hugo itself.
There is paragraph under h1
H3
Something
This is bold, italic, code
, verbatim
and strike text.
Style *
isn’t supported.
[1/3]
[33%]
Heading and has special class however <ul>
and <li>
are plain.
Items are added with special class.
number | description |
---|---|
1 | looooong long name |
5 | name |
<tr>
has even
and odd
classes.
Emacs Lisp:
(defun negate (x)
"Negate the value of x."
(- x))
(print
(negate 10))
There are interesting classes like sourceCode
and example
.
Also there html5 attributes prefixed with rundoc-
.
Haskell:
factorial :: Int -> Int
factorial 0 = 1
factorial n = n * factorial (n - 1)
LaTeX characters are wrapped in <em>
and Math inside <span class="math inline">
.
\begin{align*} 8 * 3 &= 8 + 8 \\ &= 24 \end{align*}
NOTE: There is standard LaTeX embeded above which is skipped during compilation to HTML.
This is using MathJax
\[\sum_{i=0}^n i^2 = \frac{(n^2+n)(2n+1)}{6}\]
Tags are not visible in render
Org mode is amazing. So is Hakyll & Pandoc.
http://media.riffsy.com/images/f8534774b678ad1932b379a03460680b/raw
Images has to have extension like:
then it can be loaded even from other origin..
creddits to nihilmancer
]]>I use Arch Linux and I prepared a repository with my custom install method based on archinstaller. Basically I have a custom partitioning based on LVM on top of LUKS and some variables to start he installing method.
Here is an overview of the parts on this tutorial:
I will assume that the error is similar as mine. After opening the LUKS device,
I had the following problem:
ERROR: device /dev/mapper/storage-root not found
.
To create a bootable USB you need to download the latest Arch Linux iso and
then us dd
to dump it to a flash drive.
dd bs=4M if=/path/to/archlinux.iso of=/dev/sdx status=progress && sync
After completion, use the same flash drive to boot the broken computer. You
generally need to hit a key on the F5
- F6
range to choose the boot method.
Once the system has booted, now we need to mount everything the system would mount on a boot stage. Adapt the LUKS label and LVM devices according to your needs.
# Open the LUKS device.
cryptsetup luksOpen /dev/disk/by-partlabel/cryptlvm lvm
# Mount the root partition
mount /dev/storage/root /mnt
# Mount the boot partition
mount /dev/sda1 /mnt/boot
# Mount proc, sys and dev
mount -t proc proc proc/
mount -t sysfs sys sys/
mount -o bind /dev dev/
# Switch to bash
bash
# Chroot
chroot /mnt
# Bring up network if needed
dhcpcd eth0
At this point you should have shell with a recovery environment set to fix the issue, so let’s fix it.
To fix the issue we basically are going to perform an upgrade an recreate the initial ramdisk
# Update the packages database
pacman -Syy
# Upgrade the packages
pacman -Syu
# Update udev package
pacman -S udev
# Update mkinitcpio package
pacman -S mkinitcpio
# Recreate the initial ramdisk
mkinitcpio -p linux
# Exit the chroot environment
exit
# Reboot
reboot
If everything went well, you now should be able to boot the system properly.
I was quite worried about how to solve this issue due to the several layers of complexity: GPT, LUKS and LVM, however everything went out quite smoothly. The key part is to use a USB flash drive and then mount the boot and root partitions to recreate the ramdisk.
Thanks for reading. Spot an error or want to explain something better, feel free to send me a PR.
]]>One of the problems of keyservers (in my opinion), is the problem of deleting
old keys. I pretty much like the idea of PGP keyserver which verifies
periodically by sending an email if the address is still in use and therefore
maintain that uid
of the key. The only problem is that the PGP server is
centralized and it does not use FLOSS standards.
I understand that deleting a key on a distributed environment is hard and probably maintaining a history of deletions as done actually on most of keyservers is still a good solution. However since I don’t use GPG very often and I am not publishing my key yet, I wanted to try PKA.
This tutorial is made of two basic parts, first creating a TXT DNS record and then verifying that the key gets downloaded properly.
I assume that you have gpg installed and that you know the basic idea on how it works, that you know what a private and a public keys are for and how to use them.
To create the TXT DNS record you will need to know the fingerprint of the key. To do so:
$ gpg --list-keys vonpupp@keybase.io
pub rsa2048/0x536814BF4871A220 2016-11-12 [SC] [expires: 2018-11-12]
Key fingerprint = F0B9 B3FB 25E9 1209 728E 4844 5368 14BF 4871 A220
uid [ultimate] Albert De La Fuente <vonpupp@keybase.io>
uid [ultimate] Albert De La Fuente (Social email address) <vonpupp@gmail.com>
uid [ultimate] Albert De La Fuente (Main email address) <mail@albertdelafuente.com>
uid [ultimate] Albert De La Fuente (Haevas email address) <albert@haevas.com>
uid [ultimate] Albert De La Fuente (Academic email address) <albert@ime.usp.br>
sub rsa2048/0xE2977BF3F82AB971 2016-11-12 [E] [expires: 2018-11-12]
In my case, the fingerprint is F0B9 B3FB 25E9 1209 728E 4844 5368 14BF 4871 A220, or 0x536814BF4871A220 for short.
Then you need to export the key with:
$ gpg --export -a 0x536814BF4871A220 > public-0x536814BF4871A220.asc
Then create a TXT record where:
Name
part:Is composed of mailbox
._pka.albertdelafuente.com. So for
instance if my email is long-anti-spam-email-address@albertdelafuentedotcom
,
then the name part should be
long-anti-spam-email-address._pka.albertdelafuente.com
.
Text
data:Contains the fingerprint
and the URL
where to download the key. In my case:
"v=pka1;fpr=F0B9B3FB25E91209728E4844536814BF4871A220;uri=http://albertdelafuente.com/media/files/public-0x536814BF4871A220.asc"
Do not forget to upload the key to match the uri
as on the TXT record. In my case:
http://albertdelafuente.com/media/files/public-0x536814BF4871A220.asc
Once done that you can download the key (just for testing purposes) on another computer or VM as follows:
$ echo "Test message" | gpg --auto-key-locate pka -ear mail@albertdelafuente.com
In my case this will prompt you with a confirmation since my primary uid does not match the email. This is made in purpose because I have read that some spammers are using keyservers to get valid email addresses, therefore I always use keybase as my primary id.
You will see something similar to this as the output:
gpg: directory =/home/vagrant/.gnupg' created
gpg: new configuration file =/home/vagrant/.gnupg/gpg.conf' created
gpg: WARNING: options in =/home/vagrant/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring =/home/vagrant/.gnupg/secring.gpg' created
gpg: keyring =/home/vagrant/.gnupg/pubring.gpg' created
gpg: requesting key 4871A220 from http server albertdelafuente.com
gpg: /home/vagrant/.gnupg/trustdb.gpg: trustdb created
gpg: key 4871A220: public key "Albert De La Fuente <vonpupp@keybase.io>" imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
gpg: automatically retrieved =mail@albertdelafuente.com' via PKA
gpg: F82AB971: There is no assurance this key belongs to the named user
pub 2048R/F82AB971 2016-11-12 Albert De La Fuente <vonpupp@keybase.io>
Primary key fingerprint: F0B9 B3FB 25E9 1209 728E 4844 5368 14BF 4871 A220
Subkey fingerprint: 7A10 07B4 3F49 5317 5DE0 52E8 E297 7BF3 F82A B971
It is NOT certain that the key belongs to the person named
in the user ID. If you *really* know what you are doing,
you may answer the next question with yes.
Use this key anyway? (y/N) y
-----BEGIN PGP MESSAGE-----
Version: GnuPG v1
hQEMA+KXe/P4KrlxAQf/d1yxYFBSPs0RKHJ98w+s82jK25R/IXCiFNe6BkX+oyp+
uh+4AObx93SuJ/ryHlthHQmnpid4BQWmhmCksiAH+xD1xrlrCDIsNQfJ5+aPQXjz
+Z6iKrWy8Lk13i6u3wgMZuk2eKN9Z1ppi15arXhFc93cta5p5K5tAH7CwMd5zP93
r7wgI2Jff+x3erN0zbkJ2PZgDrHZVLVWyOnwgRBw12N8El3L8i6JFbNY+g25AMUm
MMCPSTit8ILsFoPtkrJEOdq5p5aCw3dvIVSzmxflMJEsgqO+Per+KxtMehaBF5qX
I2TzltcgjlisSJ3rcBtjpm12rSVJrPs4BG2UKz0w6tJIAbF0FLlWXe8zMJMK1E3Q
BQ7y/gjTduiuuD++qyIxqWCoLCgHixvP4WiTPbbKvoXl4BP8Bf1ED9M/0Cyss2NI
tW7vVlLcXRQb
=gdKa
-----END PGP MESSAGE-----
As you can see it is not that hard to publish a public key via DNS. You need to export your key to a file and upload it and then create a TXT record relating the fingerprint and the location of the key. You may also publish your key on a keyserver also since not everybody retrieve the keys over DNS.
Thanks for reading. Spot an error or want to explain something better, feel free to send me a PR.
]]>I have been using LVM on top of LUKS for more than 2 years without problems. However since I haven’t had the chance to troubleshoot these technologies in the past I was quite concerned of what would happen when I would have problems. Today my main computer didn’t boot up and I had to fix it manually. This post shows the steps I performed to recover my system.
cryptsetup luksOpen /dev/sda2 cryptlvm
e2fsck -f /dev/mapper/storage-home
resize2fs -p /dev/mapper/storage-home 200g
e2fsck -f /dev/mapper/storage-home
(lvdisplay)
lvreduce -L -15.68G /dev/storage/home
(lvdisplay)
[lvresize -l +100%FREE /dev/storate/root]
e2fsck -f /dev/mapper/storage-root
resize2fs /dev/mapper/storage-root
cryptsetup luksClose /dev/sda2 cryptlvm
### NOT NEEDED
(pvdisplay)
pvresize --sephysicalvolumesize 218G /dev/mapper/cryptlvm
/dev/mapper/cryptlvm is active and is in use
type: LUKS1
cipher: aes-xts-plain64
keysize: 256 bits
device: /dev/sda2
offset: 4096 sectors
size: 487981391 sectors
mode: read/write
NEW_SECTORS = EXISTING_SECTORS * NEW_SIZE_IN_GB / EXISTING_SIZE_IN_GB
487981391 * 200.7 / 232.7 = 420876085
(487981391 - 15*1024*1024*2 = 456524111)
cryptsetup -b 420876085 resize cryptlvm
(cryptsetup status cryptlvm)
parted /dev/sda
resize
Pacserve
and Powerpill
on
Arch Linux. On a network of several Arch Linux boxes, it is possible to
configure a Master
server which will hold a copy of all the installed packages
within that host and keep it up to date frequently. The other hosts on the
network can be configured as Slaves
and they will download the packages from
the Master whenever possible instead of the Internet directly. The objective is
to save bandwidth and time when using the package manager.
The documentation of Pacserve explains how it works. This image summarizes well the behavior.
Pacserve by itself seems very interesting however I could not make it work properly. I think it has something to do my router UDP multicast forwarding support (see openwrt-multicast). Nevertheless, I could use it within Powerpill which is a pacman wrapper to allow parallel and segmented downloads through Aria2 and Reflector. This combo sounds even better, so I will focus on how to install and use the tree tools together.
# Just in case the testdb does not exists
gpg --list-key
# Receive xyne gpg key
gpg --recv-keys 1D1F0DC78F173680
# Install the packages
yaourt -Sy --noconfirm reflector pacserve powerpill
After everything is installed:
# Enable opening the ports
sudo systemctl enable pacserve-ports.service
# Enable pacserve
sudo systemctl enable pacserve.service
# Start pacserve
sudo systemctl start pacserve.service
For troubleshooting purposes the services can be started manually. See the Troubleshooting section if you have problems.
Edit the file /etc/powerpill/powerpill.json
to include the `pacserve`.`serve`
attribute:
{
"aria2": {
"args": [
"--allow-overwrite=true",
"--always-resume=false",
"--auto-file-renaming=false",
"--check-integrity=true",
"--conditional-get=true",
"--continue=true",
"--file-allocation=none",
"--log-level=error",
"--max-concurrent-downloads=100",
"--max-connection-per-server=5",
"--min-split-size=5M",
"--remote-time=true",
"--show-console-readout=true"
],
"path": "/usr/bin/aria2c"
},
"pacman": {
"config": "/etc/pacman.conf",
"path": "/usr/bin/pacman"
},
"pacserve": {
"server": "http://192.168.56.6:15678"
},
"powerpill": {
"ask": true,
"reflect databases": false
},
"reflector": {
"args.unused": [
"--protocol",
"http",
"--latest",
"50"
]
},
"rsync": {
"args": [
"--no-motd",
"--progress"
],
"db only": true,
"path": "/usr/bin/rsync",
"servers": []
}
}
In this example I configured two VMs using Vagrant with an internal network.
The server runs on 192.168.56.6
on port `15678` (default port).
After this we can use Powerpill as a replacement of Pacman as follows
# System upgrade
sudo powerpill -Syu
# Install a package
sudo powerpill -S wget
When powerpill is used..
As I said before I could not make it working using multicast, so I had to manually start the Pacserve Master host as:
Input:
pacserve --multicast
Output:
PacserveServer
PID 1048
Addresses
lo: 127.0.0.1
eth0: 10.0.2.15
enp0s9: 192.168.56.6
Port 15678
Multicast Address all interfaces
Multicast Port 15679
Multicast Group 224.3.45.67
Multicast Interval 5m
Multicast Interfaces all
Filelist None
Filterlist None
MOTD None
Upload Directory None
Paths None
Static Peers None
Press ctrl+C to exit.
[2015-10-23 05:54:54 AEDT] INFO: announcing presence by multicast (group: 224.3.45.67)
[2015-10-23 05:54:54 AEDT] INFO: announcement sent via all interfaces
Assuming the server is running with the IP 192.168.56.6
, you can manually
start Pacserve on the Slave(s) hosts as:
Input:
pacserve --multicast --peer "http://192.168.56.6:15678"
Output:
PacserveServer
PID 1102
Addresses
lo: 127.0.0.1
eth0: 10.0.2.15
enp0s9: 192.168.56.7
Port 15678
Multicast Address all interfaces
Multicast Port 15679
Multicast Group 224.3.45.67
Multicast Interval 5m
Multicast Interfaces all
Filelist None
Filterlist None
MOTD None
Upload Directory None
Paths None
Static Peers http://192.168.56.6:15678
Press ctrl+C to exit.
[2015-10-23 05:56:00 AEDT] INFO: announcing presence via POST to http://192.168.56.6:15678
[2015-10-23 05:56:00 AEDT] INFO: POSTing to http://192.168.56.6:15678/ [type: nudge]
[2015-10-23 05:56:01 AEDT] INFO: announcing presence by multicast (group: 224.3.45.67)
[2015-10-23 05:56:01 AEDT] INFO: announcement sent via all interfaces
After starting the Slave host you should get the following message on the Master console:
[2015-10-23 05:56:00 AEDT] INFO: added http://192.168.56.7:15678/ (POST)
[2015-10-23 05:56:00 AEDT] INFO: 192.168.56.7 "POST / HTTP/1.1" 200 -
Since I’m not very familiarized with the use case data itself I looked up for a way on how to apply for the same concepts on other areas, and then I came up making some experiments with my Facebook and LinkedIn networks. I learned many things and the results were really interesting.
I first started with my Facebook contacts and this was the initial result
Afterward I applied a community detection algorithm (Louvain) with a specific parameter and I could identify 63 groups / communities.
Can you see them? Well… neither do I… So I applied a atlas layout algorithm so it’s visually friendly, and this is the result
It’s interesting to notice that about 46% of my contacts are within the three biggest groups (20.77%, 14.46% and 12.52%). There are also contacts (that multicolored group in the middle) that I couldn’t retrieve their connections (probably because a privacy setting). Of course the graphics generated are anonymized, but I’m able to identify each group, for instance the bottom middle blue group are my last job work European colleagues. The top right cyan group are some friends from FLOSS communities. Pretty cool, huh?
On the other hand getting the Linkedin results wasn’t that straightforward I had to deal with oauth and the LinkedIn API webservice directly within python. This are the results of my LinkedIn network.
Some notable groups at first were: bottom left cyan group are again my last work European colleagues, and green my last work Latin America colleagues.
This is an amazingly powerful analytics tool that could be used in many areas, probably the most notable (nowadays) are marketing and social networking. At the same time it’s quite scary to see how our privacy goes away, I don’t think it would be hard to track somebody having the right information. If I could do this by just examining the relationships imagine what could be possible with some extra data.
]]>He commented that patents are like walking in a mine field, but worst because a mine ones it blows up it doesn’t blows again, patents keeps blowing on and on. He later highlighted some different scenarios.
You might think that you had a great idea and you want to patent it so no one will stole your idea. Later a big company comes in and develops something around your idea. Then something similar will happen…
> (You) You go there and say: Hey this idea is mine, and I have the patent so you cannot use it!
>> (Big company) Ohh, it is a shame… But let me remind you that within your idea you used some ideas similar to these three patents of us…
> (oh oh)
>> …and we can find other patents (among our thousands) of us, you might violated with your idea as well…
> ehmmm
>> …so in order to no one get hurts, lets make a deal: You give us the right to use your patent and we wont sue you
> Well… You evaluate (in silence) and remember that they have some good lawyers and a lot of money as well… And you end by saying: Okay you can use it
>> Great. Thanks =D
Conclusion: The big company wins.
The truth is that it won’t protect you unless you find another infringement on the other side that is covered by your patent. A minor company wont sue you anyway since they don’t have the resources, and a big one will have very good lawyers to do so and to also defend themselves, so it’s a endless legal battle and unnecessary waste of money.
He also pointed that in fact a patent could be also be harmful if you try to defend yourself you don’t know how to use it properly, or something is wrong within the patent.
A never ending legal story if someones attacks you, or in the best scenario nobody will attack you so you won’t really need the patent in first place.
Later he also made a symbolism about software and music which personally I found very interesting. “Imagine that when Beethoven was alive the whole patent idea arose and somebody patented parts of musical pieces. Then some beautiful compositions would never been made. It’s easier to compose for the beauty of doing so than composing with the worry of not getting sued”.
The whole speech makes a lot of sense while patents in software don’t.
]]>