WoW linguistics is just fascinating March 17, 2014Posted by anywhereinfo in search, spellchecker.
Tags: search, spell-checker
add a comment
Frederick J. Damerau formulated the “Damerau – Levenshtein distance“, used in spelling correction, postulates that 80% of all human misspellings happen due to a single character mistake based on the following four operations:-
SOA Antipatterns January 21, 2014Posted by anywhereinfo in SOA.
add a comment
Key Issues: Below are some key issues that I have faced during implementation of services based architecture.
- Underestimate time, budget and effort: Though this seems obvious as most IT projects are hard to estimate, but it becomes particularly important when implementing SOA. Due to SOA’s unique requirements, it requires a top down analysis to assess the current state, it requires analysis of existing applications to map which business function is being met by how many applications, it requires analysis of various data stores in the enterprise to determine redundant data stores and different representations of same business entity, it requires analysis of existing infrastructure to determine if it would be adequate for the distributed nature of SOA and the extra traffic that would be generated by it and finally it requires analysis of how nonfunctional requirements are being met across enterprise and standardize them such as security, monitoring etc.
All of the above requires funding, resources and time, which cannot be attributed to a particular business project.
- Adoption of SOA without a proven reference architecture: A reference architecture should provide
- Architecture blueprints
- Technology stack
- To create a reference architecture, a representative set of business requirements should be chosen that can represent key technical challenges. POC would be required to determine build vs buy decisions or to determine right set of technologies. Examples of build vs buy vs adopt open source and other POC decisions:-
- Determine ESB framework/vendor
- Determine HA architecture for ESB
- Determine caching architecture
- Determine Security frameworks/vendor
- Determine service monitoring solutions
- Determine SOA governance frameworks/vendor
- Determine service implementation approach such as SCA vs java Pojo
- Determine service monitoring tools
- Determine service testing tools
- Jump into identifying services without understanding business processes and their inter-relationships. This can result in set of services that are not aligned with business architecture and consequently, the “Business Agility” or the ability to create new products from existing services is hampered. My recommendation is to follow IBM’s SOAM(service oriented modelling architecture) approach which lays guidelines for identifying and categorizing services.
- Not identifying the right scope for the service. Too coarse grained impacts performance, especially in constrained environments such as mobile. Too fine grained results in large number of services which become a maintenance and transactional nightmare.
- Not isolating business services from technical bindings such as SOAP, HTTP, JMS etc. A lot of companies who wrote services without such isolation had hard time adopting to newer architectural approaches such as REST.
- Performance issues. For example, should each and every service be a remote process or should a component based service model be adopted where services can be collocated within a process.
- Not thinking about exception handling and transaction boundaries while designing services. Distributed transactions should be avoided where possible.
- Not planning for Service governance such as SCM, service lifecycle process, service retirement, service versioning. Appropriate checks are required during software lifecycle development such as architecture review, design review etc. to ensure service reuse and that service would meet SLA. A service directory is needed that allows services to be easily located and provides documentation for the service usage, including contact of the development team who owns the service.
- Lack of infrastructure and network capacity planning. With distributed services, network traffic tends to increase dramatically.
- Lack of service monitoring. Services are remote and reusable. A downtime or degrade of service SLA can impact many clients. Hence, it is critical to implement a service monitoring.
- Implementing enterprise SOA without mapping business process and existing applications. As a consequence, rather than elimination/merging of duplicate functionality across applications, SOA creates yet another duplication.
- Not scoping out entity Harmonization if different representations exist between applications such as user identity, product definition etc. The lack of data ownership results in complex data integration, which ultimately hinders building new products quickly.
- Organization structure change: Typically business division have their own vertical development silos, whereas SOA typically looks horizontally across the enterprise. To adopt SOA, existing teams need to be aligned with services as opposed to business divisions. This can cause quite a stir as it may imply change in responsibilities.
- Not identifying correct testing tools for SOA. SOA brings some unique challenges to testing such as
- Non UI Components
- Tools that can cope up with subscriptions to ESB
- Tools that understand security protocols such as OAuth, SAML etc
- Service Exception handling
- End to End testing
2014 Family & Personal January 1, 2014Posted by anywhereinfo in Personal.
Tags: New year resolutions
add a comment
I would like to
1. Read on child psychology
2. Learn sign language
3. Spend time with my daughter to teach her how to speak and find ways to enrich her mind
4. Spend time with my son to teach him graphics and get him graphics software and get him into some sort of art school
5. Take more responsibility of house upkeeping so that if needed, i can be a single parent
6. Remove any dependence on wife
7. Keep some sort of journal, audio, video and/or digital so that i can remember moments in life, when i look back at my life few years from now. Record significant events.
7. Lastly, i would like to make a new friend
Health for 2014 January 1, 2014Posted by anywhereinfo in Personal.
Tags: New year resolutions
add a comment
My first impression was to say “Do exercise” and that would be it. But, in reality, my health is much much more complex. I suffer from plethora of problems such as
- Extreme Anxiety
- Low working memory
- High Blood pressure
It is so bad that all of them impact my every day, including my ability to work, my relationship with my wife and kids. I need to fix it in order to do anything in life.
For Diabetes, I am planning to starve myself. Eating small low carb portions and keeping myself starved. My real motivation for starvation is that it allows me to cope with high stress and learn better.
For Extreme anxiety, i am planning to include physical exercise and daily meditation . Also, some how i need to lower my base ganglia and deep lymbic activation. I can try thiamine or other supplements which can probably help. I am also planning to see a psychiatrist.
For Depression, ADD, Aspergers, i am planning to see a doctor and go from there
For low working memory, i would like to give luminosity a try.
2013 – 2014 plan January 1, 2014Posted by anywhereinfo in Personal.
Tags: New year resolutions
add a comment
Lets start with a review of 2013.
- I was able to learn and use both C and C++ on a real life project.
- I was able to learn linux networking from various perspectives such as
- Socket programming including tuning via socket options
- Packet processing options such as RPS (Receive packet steering) and RFS (Receive flow steering)
- Packet lifecycle from NIC to application
- NIC Configuration
- Linux CPU topology specially
- NUMAD (Numa affinity daemon)
- CPU Scheduling (Real time, normal)
- Interrupts and IRQ
- Bit of linux profiling tools such as OProfile, Valgrind (Memcheck, Cachegrind, Callgrind, Massif)
- Tuning bulk writes
- Linux shell programming
- Threads and data contention costs between various NUMA nodes
- Created a TCP and UDP tool that can determine bandwidth and saturate network
- Lastly, but the most IMPORTANT, High Frequency, Low latency processing by being cautious of data structures and instructions which do not cause CPU thrashing.
- Learning about enterprise Identity management frameworks such as OAuth, SAML, Meta directories and various vendors in that space
- Looked briefly at SOA, EDA and SOA, Rest based Services, ESB uses vs Integration frameworks such as CAMEL or Spring integration AND SCA
- Looked briefly as Hadoop, HDFS, Apache FLUME, Apache Storm etc
- Began learning statistics for data analytics
- Began learning python because many online courses such as edX and MIT open courseware and data analytics software uses python. But a special thanks to my Son for forcing me to learn it.
- Possibility of a job as an Enterprise Architect
- Learnt bit of Android
- Learnt bit of Continuos integration tools such as Jenkins and Chef. As part of Chef, i learnt a bit of Ruby
- Learnt bit of Maven
- If i get job as an Enterprise Architect, i would have to deep dive into Java, SOA, Hadoop, EDA, Data Analytics and Enterprise Security frameworks.
- Data Analytics: I would like to pick up (with bold items mandatory)
- Statistics (via online course at edX)
- Probability (via online course at edX)
- Algorithms from Data Science books
- Machine learning
- Graph theory
- Programming language R
- Linear Algebra
- Artificial Intelligence
- Python (already started)
- Some java based inference framework
- Use python for data analysis
- Hadoop: At-least be aware of the stack, install and develop few basic examples.
- SOA from Enterprise requirements perspective (service discovery)
- Binding agnostic services
- SOA deployment such as SCA
- EDA and its role in SOA
- Take Udi Dahan’s courses
- Develop few SOA examples using Java EE SOA stack
- Enterprise security frameworks:- Deep dive into OAuth 2.0, OpenID Connect protocols along with few examples using google API
- Lastly, i will like to build upon what i learnt in 2013. That is work towards my passion for Low Latency computing using c/c++. Towards that i would like
- Various concurrency approaches such as threads,actors and frameworks such intel thread building blocks
- Learn assembly so that i can see the cost of mutexes, atomics etc
So now, the main question is how to prioritize my day and the year to achieve the above. Not all the priorities are clear.
- At any rate, i need to get into data analytics. I would say the min to achieve it is probability, statistics, data science and R
- I need a job, which is going to come in Java world. Hopefully i can learn Hadoop and SOA
- Lastly, i would like to achieve something for my passion for low latency programming using c/c++
Out of the above 3, the most important is #2. I need a job. Which probably means Java/SOA and perhaps Hadoop. So perhaps the action plan should be
- Learn Java based SOA, Spring, Maven, Hibernate stack.
- Learn Hadoop stack
- Move towards data analytics.
Histograms December 20, 2013Posted by anywhereinfo in Uncategorized.
add a comment
Histograms are great way to describe how a quantitative variable is distributed over all its values. For example consider a quantitative variable such as “salary” of people in a country. This data set is too large, thus to summarize it, we can construct a frequency table as shown below.
A frequency table lists the frequency (number) or relative frequency (fraction) of observations that fall in various ranges, called “class intervals.” There is also an endpoint convention associated with class intervals. If an observation falls on the boundary between two class intervals, in which class interval do we count the observation? For example in the above frequency table, where are the people who earn 10k counted? Where they counted in 0-10K class interval or where they counted in 10 – 25K class interval? The two standard choices are always to include the left boundary and exclude the right, for example [0,10k), except for the rightmost class interval, or always to include the right boundary and exclude the left, for example, (0, 10K], except for the leftmost class interval.
Even though the population % values have a pattern such as, an increment from 20-28, then decrementing pattern from 28-27, 27-18 and finally 18-7, the bar heights in the histogram does not follow this pattern. Instead the bar heights are consistently decreasing as shown below.
The reason behind the difference in pattern is due to the fact that the data on x-axis is not evenly spread. The first row of the table specifies that 20% of the population earns between 0-10k (difference of 10K), where the next row describes that 28% of the population earns salary bracket between 10k-25k (a difference of 15k). Likewise, the other rows describe a different salary interval such as 25K-50K, 50K-100k and 100K-150K.
When drawing a histogram, height, width and area matters. Histograms are different from Bar charts. The area matters the most in histograms, where as the height matters the most in Bar charts. Thus in histograms, when we have uneven intervals such as above, the height of the bar graph is determined by formula
Height X Width = Area
Now, based on our data, Area for first row = 20% and width of row = 10K (0-10K), Hence the height of the bar would be 2 % per thousand. Calculating heights of other bars, we get the following table
|Salary(K)||Population(%)||Height(% per K)|
- Hence, a histogram allows the quantitative variable to be binned into unequal intervals.
- Horizontal axis must be drawn to the scale
So the question remains, if % of population is not represented by the height, what does the height of a bar represents? The height represents the area, per unit of x axis. In this case, % of population per thousand of salary. Hence, the height represents the density in the interval.
The flat area on top of each bar, for example the flat area on top of 0-10K bar gives an impression that people are uniformly distributed over this interval. Hence, the 1K interval in the 0-10K bar, represents 2% of the population. This is not true and wider the interval, the less reliable is this assumption.
Hence, assuming uniform distribution within a bar, the % in the sub interval = height of the bar * width of the sub interval.
Stem & Leaf diagram December 19, 2013Posted by anywhereinfo in Descriptive Statistics, Math, Statistics.
Tags: Descriptive Statistics, Math, Statistics
add a comment
John Tukey invented stem and leaf diagram to summarize data. Consider class scores such as
It can be represented in stem and leaf diagram as
1. It is easy to create
2. Retains all data
3. Visually shows data distribution
1. The scores are not even spread, but the graph gives an illusion of them being evenly apart. For example, difference between 45 and 63 is not the same as distance between consecutive 63 scores.
2. This diagram is bad for large data sets
3. If all data is say in 80’s, then only one long line will be available
Agile methodology and neuroscience December 14, 2013Posted by anywhereinfo in Agile, Brain, learning.
add a comment
I was reading an article called about improving learning in patients with brain injuries and was amazed by the strategies outlined in the article with Agile methodology.
Six basic factors have been identified as being critical to the process of learning and generalization. The first three factors
- environmental context (crowded, noisy, unfamiliar surrounding such as airport terminal vs noisy, crowded BUT familiar environment such as coffee shop)
- nature of the task (complexity of task, ability to relate task with previous experiences)
- and learning criteria -are external to the learner. (key performance criteria to assess if learning was successful)
The last three factors
- meta-cognition (Meta-cognitive skills include the ability to evaluate the difficulty of a task, predict the consequences of action, formulate goals, plan, self-monitor performance, and demonstrate self-control)
- processing strategies (selecting relevant information, prioritizing, rehearsing, categorizing information, associating, elaborating)
- and learner’s characteristics (previous knowledge, existing skills, emotions, experiences, motivation and attitudes towards task)
are internal to the learner.
A fact should be presented in multiple contexts in order to train the mind to generalize the learning so that it can be applied in multiple contexts.
It has since been argued that if what is taught is abstract and removed from the context and conditions of its application, it will be unrelated to previous experience and learned as an isolated, meaningless structure . Conversely, if what is taught is embedded in only one context, such as a dressing task, the skills learned may be accessible only in relation to that specific context . The implication is that exclusive use of either abstract tasks or functional tasks results in a decreased ability to transfer the skills learned in therapy to other situations.
Some techniques to improve meta-cognition include self-estimation, role reversal, self-questioning, and self-evaluation
Self-estimation: The patient estimates one or more of the following parameters before, during, or after completing a task:
- task difficulty (e.g., the patient is asked to rate task difficulty on a scale of very easy, it will not require any extra concentration or effort to very difficult and beyond my abilities, i will not be able to complete the task even if i try hard)
- time to complete the task
- number correct (or amount of errors)
- and amount of assistance needed (number of cues).
Initially, the patient is asked to estimate his or her performance during or immediately after performing a task. The patient’s self-assessment is compared with the actual results to help the patient evaluate his or her performance. If necessary, a scoring system is used in which the patient is assisted in keeping track of his or her score or time. When the patient can accurately assess his or her performance, he or she is then asked to predict his or her performance before performing a task. The patient’s original prediction is compared with his or her actual performance. The objective is to increase the accuracy of predictions so that they become more realistic; the emphasis is not on improving accuracy of performance.
Role reversal : The patient observes a therapist performing a task The therapist makes errors and the patient must identify the therapist’s errors and hypothesize why the errors occurred (e.g., the therapist went too fast or did not pay attention to details). The goal is to increase error detection and analysis skills
Self questioning: At specific times during a task, the patient is asked to stop and answer the same two or three questions.
- “How am I doing?”
- “Have I followed the directions accurately?”
The goal is to help the patient monitor performance during a task.
Self-evaluation: After performing an activity, the patient fills out a self-evaluation form to help him or her accurately assess outcome. Questions include, “Have I checked over all my work carefully?” “Have I paid attention to all the details/” “Have I crossed out or removed all of the unnecessary information?” “How confident do I feel with my results?” (e.g., “I feel 100% confident that my results are accurate”).
Relation of New Information to Previously Learned Knowledge or Skills (Code Katas)
Knowledge and familiarity with a task affects both processing speed and strategy selection. Hence, when dealing with unknowns, first guesses are usually wrong. This re-enforces the concept of
- Prototyping to become familiar with task
- Do the difficult task again and again. As the task becomes more familiar, your learning about the task improves
- If you are reading or learning the task the first time, you should carve out time to become familiar with the task before deep diving and committing to it.
Information is better learned and better retained when the person can relate new information to previously learned skills or knowledge, Information that cannot be connected to experience is devoid of meaning, The learner usually makes attempts to elaborate new information and associate it with experiences to make it more meaningful.
How brain stores memories aka learning December 14, 2013Posted by anywhereinfo in Brain.
add a comment
I am reading the book “Brain Rules“. I started with Chapter 5 about “Short term memory”. As per the author, the life cycle of the “declarative memory” is divided into following four sequential steps
Remarkably similar to the life cycle(CRUD) of data in the information systems. Interestingly, the update in the brain is a violent painful process and perhaps brain does not update in a typical sense of over-writing the existing data with the updated data. Rather, It seems to be keeping an event log and consciously or sub-consciously uses both old and new data about an entity to make decisions.
Getting back to Encoding step, author states that very little is known about it. From brain imaging, we can see various parts of brain being lit up while trying to store the data. As per the author, encoding is like blending with lid open. The information is literally sliced into discrete pieces as it enters our brain and splattered over different regions in the brain. This separation of a cohesive whole is very violent and pervasive.
Say you want to store the line “Cats have terrific memory”. Brain stores the context, the consonants, the vowels in different places. The question is then, how do features that are saved separately, become reunited to produce a perception of cohesive whole? It is called the “Binding problem”.
Engrams are a hypothetical means by which memory is encoded and stored as biophysical or biochemical changes in brain, in response to external stimuli. The existence of engrams is posited by some scientific theories to explain the persistence of memories in the brain and the “how” the brain stores the memories. The existence of neurologically defined engrams is not significantly disputed, though their exact mechanism and location has been a focus of persistent research for many decades.
The author points towards a research which only gives a high level co-relation between encoding memories and spatial awareness. The study is related to encoding problems faced by people with Balint’s syndrome.
Simultanagnosia is the inability to perceive simultaneous events or objects in one’s visual field. Victims of Bálint’s syndrome perceive the world erratically, as a series of single objects rather than seeing the wholeness of a scene.
Blue Brain Project December 2, 2013Posted by anywhereinfo in Brain.
add a comment
I came across this interesting effort to create a brain model called Blue Brain Project.
The ultimate goal of the Blue Brain Project is to reverse engineer the mammalian brain. To achieve this goal the project has set itself four key objectives:
- Create a Brain Simulation Facility with the ability to build models of the healthy and diseased brain, at different scales, with different levels of detail in different species
- Demonstrate the feasibility and value of this strategy by creating and validating a biologically detailed model of the neocortical column in the somatosensory cortex of young rats
- Use this model to discover basic principles governing the structure and function of the brain
- Exploit these principles to create larger more detailed brain models, and to develop strategies to model the complete human brain